The LLM ban is unenforceable, they must know this. Is it to scare off the most obvious stuff and have a way to kick people off easily in case of incomplete evidence?
It is enforceable, I think you mean to say that it cannot be prevented since people can attempt to hide their usage? Most rules and laws are like that, you proscribe some behavior but that doesn't prevent people from doing it. Therefore you typically need to also define punishments:
> This policy is not open to discussion, any content submitted that is clearly labelled as LLM-generated (including issues, merge requests, and merge request descriptions) will be immediately closed, and any attempt to bypass this policy will result in a ban from the project.
What happens when the PR is clear, reasonable, short, checked by a human, and clearly fixes, implements, or otherwise improves the code base and has no alternative implementation that is reasonably different from the initially presented version?
If you're going to set a firm "no AI" policy, then my inclination would be to treat that kind of PR in the same way the US legal system does evidence obtained illegally: you say "sorry, no, we told you the rules and so you've wasted effort -- we will not take this even if it is good and perhaps the only sensible implementation". Perhaps somebody else will eventually re-implement it later without looking at the AI PR.
How funny would it be if the path to actually implement that thing is then cut off because of a PR that was submitted with the exact same patch. I'm honestly sitting here grinning at the absurdity demonstrated here. Some things can only be done a certain way. Especially when you're working with 3rd party libraries and APIs. The name of the function is the name of the function. There's no walking around it.
It follows the same reasoning as when someone purposefully copies code from a codebase into another where the license doesn't allow.
Yes it might be the only viable solution, and most likely no one will ever know you copied it, but if you get found out most maintainers will not merge your PR.
That's why I said "somebody else, without looking at it". Clean-room reimplementation, if you like. The functionality is not forever unimplementable, it is only not implementable by merging this AI-generated PR.
It's similar to how I can't implement a feature by copying-and-pasting the obvious code from some commercially licensed project. But somebody else could write basically the same thing independently without knowing about the proprietary-license code, and that would be fine.
You not realizing how ridiculous this is, is exactly why half of all devs are about to get left behind.
Like, this should be enshrined as the quintessential “they simply, obstinately, perilously, refused to get it” moment.
Shortly, no one is going to care about anyone’s bespoke manual keyboard entry of code if it takes 10 times as long to produce the same functionality with imperceptibly less error.
> Shortly, no one is going to care about anyone’s bespoke manual keyboard entry of code if it takes 10 times as long to produce the same functionality with imperceptibly less error.
Well that day doesn't appear to be coming any time soon. Even after years of supposed improvements, LLMs make mistakes so frequently that you can't trust anything they put out, which completely negates any time savings from not writing the code.
1) Most people still don't use TDD, which absolutely solves much of this.
2) Most poople end up leaning too heavily on the LLM, which, well, blows up in their face.
3) Most people don't follow best practices or designs, which the LLM absolutely does NOT know about NOR does it default to.
4) Most people ask it to do too much and then get disappointed when it screws up.
Perfect example:
> you can't trust anything they put out
Yeah, that screams "missing TDD that you vetted" to me. I have yet to see it not try to pass a test correctly that I've vetted (at least in the past 2 months) Learn how to be a good dev first.
> no one is going to care about anyone’s bespoke manual keyboard entry of code if it takes 10 times as long to produce the same functionality with imperceptibly less error.
No one is going to care about anyone’s painstaking avoidance of chlorofluorocarbons if it takes ten times as long to style your hair with imperceptibly less ozone hole damage.
This is a non-argument. All of the cloud LLM's are going to move to things like micronuclear. And the scientific advances AI might enable may also help avoid downstream problems from the carbon footprint
The problem is that even if the code is clear and easy to understand AND it fixes a problem, it still might not be suitable as a pull request. Perhaps it changes the code in a way that would complicate other work in progress or planned and wouldn't just be a simple merge. Perhaps it creates a vulnerability somewhere else or additional cognitive load to understand the change. Perhaps it adds a feature the project maintainer specifically doesn't want to add. Perhaps it just simply takes up too much of their time to look at.
There are plenty of good reasons why somebody might not want your PR, independent of how good or useful to you your change is.
How would you tell that it's LLM-generated in that case?
If the submitter is prepared to explain the code and vouch for its quality then that might reasonably fall under "don't ask, don't tell".
However, if LLM output is either (a) uncopyrightable or (b) considered a derivative work of the source that was used to train the model, then you have a legal problem. And the legal system does care about invisible "bit colour".
Both sides written by an LLM. Both sides written based on my explicit prompts explaining exactly how I want it to behave, then testing, retesting, and generally doing all the normal software eng due diligence necessary for basic QA. Sometimes the prompts are explicitly "change this variable name" and it ends up changing 2 lines of code no different from a find/replace.
Also I'm watching it reason in real time by running terminal commands to probe runtime data and extrapolate the right code. I've already seen it fix basic bugs because an RFC wasn't adhered to perfectly. Even leaving a nice comment explaining why we're ignoring the RFC in that one spot.
Eventually these arguments are kinda exhausting. People will use it to build stuff and the stuff they build ends up retraining it so we're already hundreds of generations deep on the retraining already and talking about licenses at this point feels absurd to me.
You may as well be the MPAA right now throwing threats around sharing MP3s. We're past the point of caring and the laws will catch up with reality eventually. The US copyright office says things that get turned over in court all the time.
Tell me, how have laws “caught up with” “the [RIAA…] throwing threats around sharing MP3s?” So far as I know that’s still considered copyright infringement and the person doing it, if caught, can be liable for very substantial statutory damages.
It sounds like you really can’t handle being told “no, you can’t use an LLM for this” by someone else, even if they have every right to do so. You should probably talk to your therapist about that.
The entire industry is right now encouraging LLM use all day everyday at big corps including mine. If your argument is the code we are producing isn't copyright of our employers you won't get very far. Call it the realpolitik of tech if you want.
The simplest refutation of your point of view is, who or what is responsible if the work submission is wrong?
It will always be the person’s, never the computer’s. Conveniently, AI always acts as if it has no skin in the game… because it literally and figuratively doesn’t… so for people to treat it like it does, should be penalized
You sound like someone who has literally zero understanding as to why that is a ridiculous comparison.
There are a thousand and one ways that I participate when building something with LLM assistance. Everything from ORIGINATING AN IDEA TO BEGIN WITH, to working on a thorough spec for it, to ensuring tests are actually valid, to asking for specific designs like hexagonal design, to specific things like benchmarks... literally ALL OF THE INITIATIVE IS MINE, AND ALL OF THE SUCCESS/FAILURE CONSEQUENCES ARE MINE, AND THAT IS ULTIMATELY ALL THAT MATTERS
Please head towards a different career if you now have a stupid and contrived excuse not to continue working with the machines, because you sound like a whining child
And you're not answering the question, because you know it would end your point: WHO OR WHAT IS RESPONSIBLE IF THE CODE SUCCEEDS OR FAILS?
I started working in the industry when you were able to buy a Lisp Machine new and have been studying AI even longer, and I’ve been very successful in it. I not only know what I’m talking about, I have the experience to back it up.
You sound like someone who’s deeply in denial about exactly how the LLM plagiarism machines work. You really do sound like a student defending themselves against a plagiarism charge by asserting that since they did the work of choosing the text to put into their essay and massaging the grammar so it fit, nobody should care where it came from.
By that definition, every single human who wrote a paper after reading a source document is a “plagiarism machine”
and I’m 53 and well remember Symbolics from freshman year at Cornell, in fact my application essay to it was about fuzzy logic (AI-tangential) and probably got me in, so I too am quite familiar
i’m also quite good at debate. the flaw in your logic is that plagiarism requires accountability and no machine can be accountable, only the human that used it, ergo, it is still the work of the human, because the human values, the human vets, the human initiates, and the human gains or loses based on the combined output, end of story; accelerated thought is still thought, and anyway, if a machine can replicate thought, then it wasn’t particularly original to begin with
Yes, what happens when the murder looks like a heart attack? This isn't hypothetical, some assassinations occur like this. That doesn't make murder laws unenforceable.
Lots of people try to get away with perfect crimes and sometimes do. That doesn't make the rule unenforceable, it just highlights the limits of human knowledge in the face of a dishonest person. Hence the escalations for trying to destroy evidence of crimes or in this case to work around the AI policy. Here, instead of just closing your PR, they ban you if you try to hide it.
I think the bigger point about enforcement is not whether you're able to detect "content submitted that is clearly labelled as LLM-generated", but that banning presumes you can identify the origin. Ie.: any individual contributor must be known to have (at most) one identity.
Once identity is guaranteed, privileges basically come down to reputation — which in this case is a binary "you're okay until we detect content that is clearly labelled as LLM-generated".
[Added]
Note that identity (especially avoiding duplicate identity) is not easily solved.
Well, unenforceable isn't a synonym for undetectable or awkward. Their policy indicates that they are aware of this difficulty: if you admit to using AI then they close your pull request, if you do not admit to using AI but evidence later surfaces that you did then they ban you. They can enforce this.
The hope here is the same hope as most laws: that lies eventually catch up to people. That truth comes to light. But sure, in the meanwhile, there are always dishonest people around trying to flout rules to varying degrees of success. Some are caught right away, some live their entire lives without it catching up to them. That doesn't make the rule unenforceable, that just highlights the limits of rules: it requires evidence that can be hard to come by.
Real people in the real world understand that rules don’t simply cease to exist because there’s no technical means of guaranteeing their obedience. You simply ask people to follow them, and to affirm that they’re following them whether explicitly or implicitly, and then mete out severe social consequences for being a filthy fucking liar.
There’s this thing called “honor” where if you tell someone that they need to affirm their contribution is their own work and not created with an LLM, most people most of the time will tell the truth—especially if the “no LLMs” requirement is clearly stated up front.
You’re basically saying that a “no-LLMs” rule doesn’t matter, because dishonorable people exist. That’s not how most people work, and that’s not how rules work.
When we encounter a sociopath or liar, we point them out and run them out of our communities before they can do more damage, we don’t just give up and tolerate or even welcome them.
I suspect this is for now just a rough filter to remove the lowest effort PRs. It likely will not be enough for long, though, so I suspect we will see default deny policies soon enough, and various different approaches to screening potential contributors.
Any sufficiently advanced LLM-slop will be indistinguishable from regular human-slop. But that’s what they are after.
This heuristic lets the project flag problematic slop with minimal investment avoiding the cost issues with reviewing low-quality low-effort high-volume contributions, which should be near ideal.
Much like banning pornography on an artistic photo site, the perfect application on the borderline of the rule is far less important than filtering power “I know it when I see it” provides to the standard case. Plus, smut peddlers aren’t likely to set an OpenClaw bot-agent swarm loose arguing the point with you for days then posting blogs and medium articles attacking you personally for “discrimination”.
A sign to point at when you get someone is posting "I asked AI to fix this and got this". You can stop reading and any arguments right there. Saving lot of time and effort.
Just require that the CLA/Certificate of Origin statement be printed out, signed, and mailed with an envelope and stamp, where besides attesting that they appropriately license their contributions ((A)GPL, BSD, MIT, or whatever) and have the authority to do so, that they also attest that they haven't used any LLMs for their contributions. This will strongly deter direct LLM usage. Indirect usage, where people whip up LLM-generated PoCs that they then rewrite, will still probably go on, and go on without detection, but that's less objectionable morally (and legally) than trying to directly commit LLM code.
As an aside, I've noticed a huge drop off in license literacy amongst developers, as well as respect for the license choices of other developers/projects. I can't tell if LLMs caused this, but there's a noticeable difference from the way things were 10 years ago.
> As an aside, I've noticed a huge drop off in license literacy amongst developers
What do you mean by this? I always assumed this was the case anyway; MIT is, if I'm not mistaken, one of the mostly used licenses. I typically had a "fuck it" attitude when it came to the license, and I assume quite a lot of other people shared that sentiment. The code is the fun bit.
> I always assumed this was the case anyway; MIT is, if I'm not mistaken, one of the mostly used licenses
No, it wasn't that way in the 2000s, e.g., on platforms like SourceForge, where OSS devs would go out of their way to learn the terms and conditions of the popular licenses and made sure to respect each other's license choices, and usually defaulted to GPL (or LGPL), unless there was a compelling reason not to: https://web.archive.org/web/20160326002305/https://redmonk.c...
Not being able to publish anything without sifting through all the libs licences? Remembering legalese, jurisprudence, edge cases, on top of everything else?
MIT became ubiquitous because it gives us peace of mind
You have to go through all the dependencies anyway, to roughly judge their quality, and the activity of their maintainers. Quickly looking at the license doesn't take any more effort.
Sarcasm? Nobody will be contributing with a complexe signing process like that, and it doesn't guarantee anything in the end, it's like a high tech pinky swear
Lots of projects have had requirements like this for years, usually to prevent infection by (A)GPL's virality, or in the case of the FSF, so they can sue on your behalf, or less scrupulously, so the project can re-license itself or dual license itself in the future should the maintainers opt to. (This last part was traditionally the only part that elicited objections to CLAs.)
> it's like a high tech pinky swear
So is you attesting you didn't contribute any GPL'd code (which, incidentally, you arguably can't do if you're using LLMs trained on GPL'd code), and no one seemed to have issues with that, yet when it's extended to LLMs, the concern trolling starts in earnest. It's also legally binding .