3 and 4 are what that argument is based on, I believe. 3) on the basis that the output is not _reproduced_, and 4) on similar grounds that output that's just not at all the same as the input data isn't affecting the market for the original image (I think this is the more debatable one, but in general the existing cases have struggled at the early stages because the plaintiffs have not been able to actually point to output that is a copy of their part of the input, and this does actually matter).
>Directly competing with those whose data was copied.
An LLM doesnt compete with Art the same way that Photoshop doesnt compete with Art.
>All of it, from everyone.
With the result that anything produced by the LLM does not reproduce any single source in its entirety (and where compelled if they are able to do that is a bug not a feature)
Fair use is too specific tbh, rather than ruling it fair use (which seems to be where things are going) it should just be ruled "use". There's nothing wrong with building a mathematical model using available data.
> An LLM doesnt compete with Art the same way that Photoshop doesnt compete with Art.
Yes, it does. Many people are using AI-generated works in places where they originally would have either paid an artist, programmer, or other creative professional, or done without. Many companies are claiming to reduce staff because of AI (whether that's true or an excuse). There is plenty of evidence that AI is directly competing with various individuals, businesses, and industries.
> With the result that anything produced by the LLM does not reproduce any single source in its entirety
You do not have to reproduce sources in their entirety to produce derivative works.
Tools compete with Tools. Operators of tools compete with other tool operators. The tool doesn't compete in the same market as the operator. Lowering the barrier of entry for being a tool operator is cool and good actually.
>You do not have to reproduce sources in their entirety to produce derivative works.
True, but if there's no great % of the original in the derivative it doesn't matter. Like you need to actually make the positive case clearly demonstrating the wounded party or its just noise. This actually happened one time, where a legal firm loaded another parties data into an LLM and had it regenerate the data. Judge found that the result infringed despite the LLM use, which makes sense. But pointing at some weird AI generated boomer comic you cant identify any wounded party. Its slop made from enough unique sources that there's no victim, much like most derivative art forms. Making something that's 0.1% like 1000 different sources * random noise is unable to cause injury. Its not recognizably derivative in any sense except for style which isn't protected.
The four factors of fair use in the US:
> the purpose and character of your use
Commercial, for-profit. Not scholarship, not research, not commentary, not parody, etc.
> the nature of the copyrighted work
Absolutely everything. Artistic, creative, not purely factual.
> the amount and substantiality of the portion taken, and
All of it, from everyone.
> the effect of the use upon the potential market.
Directly competing with those whose data was copied.