In the world of AI
AI-Generated Code Rewrites and Open Source Licensing: A Legal Frontier Without Clear Boundaries
Lawrence PembertonMarch 11, 2026

AI-Generated Code Rewrites and Open Source Licensing: A Legal Frontier Without Clear Boundaries

Sign in to bookmark

The release of chardet version 7.0.0, an AI-assisted rewrite of a popular Python library, has sparked an intense legal and ethical dispute over whether code generated by an AI model can escape the licensing obligations of the original work it was built to replace. The case exposes a fundamental gap between intellectual property frameworks designed for human developers and the operational realities of modern AI coding tools. Prominent voices across the open source community are divided, and no court has yet provided definitive guidance on how software licensing applies when artificial intelligence serves as the primary author.

The release of a single Python library update has ignited one of the most consequential legal and ethical debates the open source software community has faced in years. At its center is a deceptively simple question: when an AI model generates an entirely new codebase inspired by an existing project, does the resulting work inherit the legal obligations of the original?

The controversy crystallized with the publication of chardet version 7.0.0, an overhaul of a widely used Python library for automatic character encoding detection. Originally authored by developer Mark Pilgrim in 2006 and released under an LGPL license, chardet had long carried licensing restrictions that limited its adoption in proprietary and closed-source software environments. Dan Blanchard, who assumed maintenance of the repository in 2012, harbored ambitions of integrating chardet into the Python standard library — ambitions that were consistently frustrated by the library's license, performance, and accuracy shortcomings.

Rather than undertaking a conventional manual rewrite, Blanchard turned to Claude Code, Anthropic's AI-powered coding assistant. The results were striking. According to Blanchard, the overhaul was completed "in roughly five days" and delivered a 48x performance boost. More significantly, the new version was declared a "ground-up, MIT-licensed rewrite" — a relicensing that would remove the restrictions that had long stood in chardet's way.

AI-Generated Code Rewrites and Open Source Licensing: A Legal Frontier Without Clear Boundaries

Pilgrim did not accept that characterization without challenge. Resurfacing on GitHub, he contended that Blanchard's extensive familiarity with the original codebase disqualifies the new version from being considered an independent work. Traditional "clean room" reverse engineering requires a strict firewall between those who analyze existing code and those who write the replacement — a separation that was never established here. As Pilgrim argued directly:

"Their claim that it is a 'complete rewrite' is irrelevant, since they had ample exposure to the originally licensed code (i.e., this is not a 'clean room' implementation). Adding a fancy code generator into the mix does not somehow grant them any additional rights. I respectfully insist that they revert the project to its original license."

Blanchard's counterargument rests on the premise that the standards governing human clean room implementations should not be applied wholesale to AI-generated code. He acknowledges having had "extensive exposure to the original codebase" but asserts that the AI-generated output is "qualitatively different" from its predecessor and "is structurally independent of the old code." To substantiate this claim, he cited structural similarity analysis using JPlag, which found that a maximum of 1.29 percent of any chardet version 7.0.0 file is structurally similar to the corresponding file in version 6.0.0. By contrast, comparisons between earlier versions revealed similarity rates as high as 80 percent in some corresponding files.

Blanchard summarized his position plainly in the GitHub thread:

"No file in the 7.0.0 codebase structurally resembles any file from any prior release. This is not a case of 'rewrote most of it but carried some files forward.' Nothing was carried forward."

His described methodology was deliberate. Blanchard began by specifying an architecture in a design document and drafting requirements for Claude Code. He then "started in an empty repository with no access to the old source tree and explicitly instructed Claude not to base anything on LGPL/GPL-licensed code." The process was anchored by what he described as a "wipe it clean" commit — a fresh repository start intended to structurally isolate the new work from the old.

Yet the case is far from clear-cut, and several complicating factors deserve serious consideration. First, Claude explicitly relied on some metadata files from previous versions of chardet, introducing a tangible point of direct continuity between the old and new codebases. Second, and more philosophically troubling, Claude's underlying models are trained on reams of data pulled from the public Internet — which means the AI almost certainly ingested prior chardet source code during its training. Whether that embedded prior knowledge constitutes a form of derivation, even when the output is structurally novel, remains an unresolved legal question.

The human dimension adds yet another layer of complexity. Blanchard was not a passive observer of the AI's output. He described his role as follows: "I reviewed, tested, and iterated on every piece of the result using Claude. … I did not write the code by hand, but I was deeply involved in designing, reviewing, and iterating on every aspect of it." The degree to which a developer's intimate knowledge of the original code shaped that iterative review process may prove legally significant.

Prominent voices across the open source ecosystem have weighed in with sharply divergent views. Free Software Foundation Executive Director Zoë Kooyman drew a firm line:

"There is nothing 'clean' about a Large Language Model which has ingested the code it is being asked to reimplement."

Open source developer Armin Ronacher offered a contrasting perspective, pushing back against philosophical arguments that conflate behavioral similarity with legal derivation:

"If you throw away all code and start from scratch, even if the end result behaves the same, it's a new ship."

The legal terrain surrounding AI-generated software remains largely unsettled at the institutional level. Courts have ruled that AI can't be the author on a patent or the copyright holder on a piece of art, but no definitive ruling has addressed what those principles mean for software licensing when code is generated in whole or in part by an AI system. Experts note that questions about potential "tainting" of open source licenses through AI-generated contributions can get remarkably complex remarkably quickly.

Beyond the immediate legal dispute, the broader implications are what many observers find most significant. If AI tools can enable rapid, low-effort rewrites of licensed open source projects — effectively resetting license obligations in days rather than years — the foundational economics of the open source model are at risk. Italian developer Salvatore "antirez" Sanfilippo framed the shift in structural terms:

"Now the process of rewriting is so simple to do, and many people are disturbed by this. There is a more fundamental truth here: the nature of software changed; the reimplementations under different licenses are just an instance of how such nature was transformed forever. Instead of combating each manifestation of automatic programming, I believe it is better to build a new mental model and adapt."

Open source advocate Bruce Perens reached for stronger language to convey the magnitude of the moment:

"I'm breaking the glass and pulling the fire alarm! The entire economics of software development are dead, gone, over, kaput! … We have been there before, for example when the printing press happened and resulted in copyright law, when the scientific method proliferated and suddenly there was a logical structure for the accumulation of knowledge. I think this one is just as large."

The chardet episode may ultimately be remembered less for its resolution than for the questions it forced into the open. Legal frameworks built around human authorship and clean room separation were not designed with AI intermediaries in mind. As AI coding tools become faster, more capable, and more widely adopted, the gap between existing intellectual property law and the realities of modern software development will only continue to widen — making this dispute not an isolated incident, but a preview of far larger conflicts to come.


Comments