So maybe one day we'll see coding agents like Claude Code create and update an ATTRIBUTION.md, citing all the open source projects and their licenses used to generate code in your project?
You got it exactly right :) And you can update the attribution.md to have it NOT rely on opensource projects that have been compromised. Imagine asking claude code to write a package/function in the style of a codebase that you care about or force it to ALWAYS rely on some internal packages that you care about. The possibilities are endless when you insert such knobs into models.
Doesn’t the nature of most open source licenses allow for AI training though?
Example — MIT:
> Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions
I remember seeing some new licenses like Human license or something iirc but they all had the valid criticism that it would be unenforcable or hard to catch decision.
I haven't looked at the project that much but this could seem exciting to me if maybe these two things can get merged.
I don't think that license is necessarily the problem in here. Licenses can change and people can adopt new licenses.
Only if there's a commercial incentive to do so methinks. Just one of the things where I expect a legal catch-up is needed to get companies to do the right thing.
This is actually being built right now. ATTRIBUTION.md (https://attribution.md) is a protocol that does exactly this. You drop a file in your repo root with a few lines of YAML, and it asks AI coding agents to prompt users to star the repos they built on.
The key design choice is that it does not automate anything. The agent surfaces a prompt, the user decides yes or no. No bulk starring, no forced actions. The spec also deliberately stays out of licensing territory. It is purely a social recognition layer.
Not as long as all developers add an ATTRIBUTION.md citing all open source projects they read the source for, all companies they worked for and trained them and all Stack Overflow answers they have used for write the code.
> Not as long as all developers add an ATTRIBUTION.md citing all open source projects they read the source for, all companies they worked for and trained them and all Stack Overflow answers they have used for write the code.
Oh? You are under the impression that software gets the same rights and privileges of humans?
Or maybe you are under the impression that you are so special that you face no danger from having no income because the models already ingested all your work and can launder it effectively?
I don't consider it a logical fallacy so much as a philosophical debate on art vs theft that exists in both human and AI worlds.
IMO Nothing and nobody starts out original. We need copying to learn, to build a foundation of knowledge and understanding. Everything is a copy of something else (or put another way, art is more like a sum of your influences). The only difference is how much is actually copied, and how obvious it is.
And in the US at least, from a legal perspective, this "how obvious is it" subjective test is often one way that copyright disputes are settled.
For example there have been many cases of similar sounding songs that either did in fact draw an influence from an existing track (whether consciously or not), or were more likely just coincidental... but courts have ruled both ways in such cases, even if they sound extremely similar.