Hacker Newsnew | past | comments | ask | show | jobs | submit | modeless's commentslogin

> Humans will naturally prefer the auditory experience of an occasional dropped packet, vs backed up audio or audio that plays at an uneven rate

Yes but the difference here is there is only one human in the conversation. The other side can tolerate a 200ms delay in receiving or sending perfectly fine because it is not constrained to run in exactly real time like a human brain is.

I think he is right. This is an interesting point that I haven't considered before. The reason we skip 200ms instead of pausing for 200ms when we get missed packets in a WebRTC call is because we can't pause the human on the other side of the call. But we can pause AI just fine.


> The reason we skip 200ms instead of pausing for 200ms when we get missed packets in a WebRTC call is because we can't pause the human on the other side of the call. But we can pause AI just fine.

This isn't about pausing anyone; it's about doing faster-than-realtime processing after a delay event. Humans can do that to some extent, and this is in fact done with some voice applications like Microsoft Teams, where after a network interruption the audio is sometimes played back really fast until the point that it becomes real-time again.

I hope it's an intentional design decision, because it works really well (for me). I can often perfectly keep track of a conversation in spite of the network delay. As much as I hate Teams, its meetings and voice implementation (also noise cancellation) works quite well, especially compared to current open source solutions like Jitsi or BigBlueButton.


Yes, it's about pausing. You pause the AI so it doesn't need to perceive the 200ms gap at all, unlike a human who will always perceive the interruption. Yes, then you run faster than real time to catch up.

Yes, humans can listen to audio faster than real time to catch up, but it degrades the experience and there is a fairly low limit to it. When talking to an AI you don't have to skip or speed up at all on the human side, is the point.


Yeah that's a really good way of framing the argument, I wish I wrote that. The way robots listen/respond is bounded by compute, not time. Buffering audio isn't a great experience for humans but definitely works for robots.

i haven't used the openai voice thing

but, if it's trying to respond in a natural way, with interruptions in both directions, it may still be a good idea. if there's a delay between you stopping and it starting talking, it feels weird

(you might be able to fake some of that on the client, but then you need a thicker client)


Which LLM can generate text so quickly a real-time conversation is viable?

There are now realtime “speech-to-speech” models [0]. I believe they skip text to streamline the architecture.

[0]: https://openai.com/index/introducing-gpt-realtime/


They stopped pursuing cutting edge fabrication processes many years ago. Always seemed like a short sighted business decision to me.

I read about it at the time. Apparently, every new node was twice as expensive to develop as the previous one, which killed off the smaller competitors one by one. Even today, I find it hard to tell whether Globalfoundries really made a mistake. They get to cash out on their existing fabs and a bit of sustaining innovation for a long time instead of potentially killing themselves swiftly by trying and failing to keep up with the leading edge.

Considering the market cap of their competitors who kept going it seems foolish to have given up entirely.

When I first heard about Mojo I somehow got the impression that they intended to make it compatible with existing Python code. But it seems like they are very far away from that for the foreseeable future. I guess you can call back and forth between Python and Mojo but Mojo itself can't run existing Python code.

In their original pitch that was definitely part of it: take Python code, add type hints, get a big speedup. As they've built it out it seems to have diverged.

It was always going to be a long-term thing, if it were even possible. You can't make a compiler that can compile Python into efficient machine code in just a year (which was how long Mojo had been in development when it was announced).

The messaging was changed because people got sold too hard on that, and kept trying Mojo with the expectation that it could compile existing Python code when it couldn't. What Modular did was change the messaging to reflect what Mojo is today, and provide a roadmap[1] of what they hope it'll turn into in the future. As it evolves, the messaging will evolve with it to continue reflecting current capabilities.

1. https://mojolang.org/docs/roadmap/


They also advertised a 36,000x speedup over equivalent Python if I remember correctly, without at any point clarifying that this could only be true in extreme edge cases. Feels more like a pump-dump cryptography scheme than an honest attempt to improve the Python ecosystem.

Well... the article made self deprecating fun of the click bait title, showed the code every step of the way, and actually did achieve the claim (albeit with wall clock time, not CPU/GPU time).

And it wasn't "equivalent python", whatever that means, they did loop unrolling and SIMD and stuff. That can't be done in pure python at all, so there literally is no equivalent python.


Watch Chris Lattner's interview with Lex Fridman. He talks about mojo as a 36,000x speedup over Python without any indication that you need to think about vectorization to achieve it.

I'm looking at this transcript and I'm getting a different picture than what you describe https://podscripts.co/podcasts/lex-fridman-podcast/381-chris... . Yea, he doesn't specifically say vectorization and multi-threading or whatever but he also doesn't say you don't need some skill to get to huge speedups.

Does he say that you _do_ need skill to get huge speedups?

In fairness it's been a long time since I watched this, but I remember being struck by how obviously dishonest Lattner was throughout. For example at one point he talks about approachin mojo from a first principles perspective, using the speed of light as a limiting factor for what's computationally possible. Complete bullshit. You'd have to be working at the hardware layer for that to begin to be relevant, and even then photonic computation is years away. It's essentially technobabble.


Speed of light is actually the limiting factor in modern chips and has been for a long time. It's one of the major reasons why process shrinks are jumps in performance: literally the speed of signals in the CPU shrinks.

Electrons moving through silicon do not move at the speed of light.

The speed of electricity is not the speed of electrons first of all. Secondly, the difference here is not very big.

Indeed, an electromagnetic pulse travelling through a copper wire will propagate at at something like 85% of the speed of light.

Leaving aside that this is still distinct from the speed of light, can you tell me how this would have in any way influenced the design of mojo?


I re-read that passage where he mentions the speed of light. Clearly he's trying to explain that if you start from the bottom of what the hardware can do and try to get that all to the programmers hands then you have something that is close to the theoretical speed. This is different from trying to go down the stack from python down and seeing where you can optimize.

I think you're a bit too angry already to interpret what he's saying as how he meant it. You seem to be twisting everything to make it sound stupid.


The modern way to advertise: lie a lot.

Crypto*

If you paid very close attention it was actually clear from the start that the idea was to build a next gen systems language, taking the lessons from Swift and Rust, targeting CPU/GPU/Heterogeneous targets, and building around MLIR. But then also building it with an eye towards eventually embedding/extending Python relatively easily. The Python framing almost certainly helped raise money.

Chris Lattner talked more about the relationship between MLIR and Mojo than Python and Mojo.


So basically Chapel, which is actually being used in HPC.

I don't know Chapel in detail, I was more thinking Hylo. I don't think Chapel has a clear value/reference semantics or ownership/lifetime story? Am I wrong here?

The Mojo docs include two sections dedicated to these topics:

https://mojolang.org/docs/manual/values/

https://mojolang.org/docs/manual/lifecycle/

The metaprogramming story seems to take inspiration from Zig, but the way comptime, parameters and ownership blend in Mojo seems relatively novel to me (as a spectator/layman):

https://mojolang.org/docs/manual/metaprogramming/

I was sort of paying attention to all these ideas and concepts two-three years ago from the sidelines (partially with the idea to learn how Julia could potentially evolve) but it's far from my area of expertise, I might well be getting stuff wrong.


You make use of 'owned', 'shared', 'unmanaged', 'borrowed'.

https://chapel-lang.org/docs/language/spec/classes.html#clas...


I see, seems like the design is not complete and a work in progress (which is the same for Mojos Origins concept I think):

"The details of lifetime checking are not yet finalized or specified. Additional syntax to specify the lifetimes of function returns will probably be needed."

I think Rust proved that lifetimes, ownership and borrow checking can be useful for a mainstream language. The discussions in the Mojo context revolve on how to improve the ergonomics of these versus Rust.


Contrary to Mojo, plenty of people are using it in HPC, and is open source.

https://hpsf.io/blog/2026/hpsf-project-communities-to-gather...

https://developer.hpe.com/platform/chapel/home

See "Projects Powered by Chapel".


So? What point are you making? A different language with different design philosophy, has success in a different niche than Mojo is targeting?

One is used in production already by key laboratories in HPC research, the other wants to be and is far away from being 1.0.

Chapel current version is 2.8.0.


I don't understand this framing, so? Cpp, Julia are more widely adopted, used in HPC. it does not mean that people shouldn't start, learn new languages.

In the LLM age, maybe the focus should be elsewhere instead of syntax.

is that so? People are still reading their code to understand it and ask (or make modifications). even in the (LLM age) language design, readability is still as relevant as before.

I don't see the superficial comparisons between why this new Y when we have X are not really helpful. Languages and system got adopted not for their stated goal only, but for the underlying details capabilities, good design which translates to better user experience and ecosystem growth.


Mojo isn't that far away from 1.0. Some point this year is the target

I don't think Mojo is targeting HPC at all.

Is it? Spack has only one package that depends on chapel.

That was what was originaly advertised, they wanted to be what Kotlin is to Java but for Python. They quickly turned tails on this.

That and the not completely open source development model is what has always felt very vaporwary to me.


That's because Mojo told you that. https://web.archive.org/web/20231221132631/https://docs.modu...

> Our long-term goal is to make Mojo a superset of Python (that is, to make Mojo compatible with existing Python programs). Python programmers should be able to use Mojo immediately, and be able to access the huge ecosystem of Python packages that are available today.


Mojo has refocused on Python interoperability vs. superset, though yes, the original idea was being a superset.

It's possible the language evolves to that in the longterm, but it's not the short term goal.

We published a Mojo roadmap on Mojolang.org that helps contextualize this: https://mojolang.org/docs/roadmap/

Note: I work at Modular


From the site:

Python interop > Mojo natively interoperates with Python so you can eliminate performance bottlenecks in existing code without rewriting everything. You can start with one function, and scale up as needed to move performance-critical code into Mojo. Your Mojo code imports naturally into Python and packages together for distribution. Likewise, you can import libraries from the Python ecosystem into your Mojo code.


> they intended to make it compatible with existing Python code

That was the original claim, but it was quietly removed from the website. (Did they fall for the common “Python is a simple language” misconception?).

Now they promise I can “write like Python”, but don’t even support fundamentals like classes (which are part of stage 3 of the roadmap, but they’re still working on stage 1).

Maybe Mojo will achieve all its goals, but so far has been over-promising and under-delivering - it’s starting to remind me of the V language.


The communication had me try to run some very simple python code assuming it of course should run (reading files line by line), which didn't work at all.

For me this was a big disappointment, and I wonder how much this has backfired across developers.


isn't that achieved by Codon?

Really the only thing good about Python is its ecosystem.

Nah, it's also a very fine language for getting an idea down quickly.

Might not have the niceties purists like, but perhaps that's exactly it's a great language for that.

It's like executable pseudocode, and unlike other languages, all the ceremony is optional.

People flocked to it way before it became a "must" for ML and CS thanks to that ecosystem becoming dominant.


Look at my statement in context of Mojo. Why would someone use this language if all they want is to "get an idea down quickly"? They would just use Python instead

but that ecosystem is realy good.

That it is

They just lie a lot, they make fake blogs with fake benchmarks and then they delete them

Mind linking some?

HAX Ventures (Formerly HAXLR8R) https://hax.co/

TroubleMaker (Conquer! program) https://troublemakershenzhen.com/member/apprentice-membershi...

Seeed Maker Camp https://www.seeedstudio.com/blog/2024/01/15/introducing-make...

I was an alumni of HAXLR8R before the rename.


Funny how people are suddenly on Elsevier's side. It's clear to me that AI training is transformative fair use under existing law. Maybe this will be the case to prove it.

I find it grating that so many AI boosters try to frame pushing back against the AI industry as a sudden about-face for everyone that spent the last 20 years pushing back against the copyright industry. I’m also in favor of decriminalizing or legalizing small amounts of pot for personal use. That doesn’t mean I’m behind industrialized narcotic production on such a huge scale that it that it starts to distort the economy, and companies looking for new ways to add methamphetamine to every goddamn product.

>I find it grating that so many AI boosters try to frame pushing back against the AI industry as a sudden about-face for everyone that spent the last 20 years pushing back against the copyright industry.

What do you think the outcome of tightening fair use is going to be? Do you think its going to be most effectual against these big evil AI companies we are meant to fear? Or is it going to end up putting more individual creators on the end of Disneys pitchforks?

Like if you support creating a gun to kill a monster, that's great. But you need to understand that weapons rarely only target the person you want them to. And its unlikely that any bill that specifically targets a certain size or profit margin is going to make it all the way into law without being generalised to the approval of large IP holders.

Its much much (much) better to look at this as an opportunity to erode IP laws for everyone, than to make them worse and hope that your particular enemies are the only ones that are affected.

>That doesn’t mean I’m behind industrialized narcotic production on such a huge scale that it that it starts to distort the economy, and companies looking for new ways to add methamphetamine to every goddamn product.

Thats such a non sequitur. This isnt a weed legalisation argument, its "Do we make IP worse for everyone, because you dont like some people benefiting from fair use".


One could imagine a different legal standards for recreational, research, and commercial uses.

> One could imagine a different legal standards for recreational, research, and commercial uses.

Meta used allegedly stolen copyrighted materials to train a model they shared for free with the whole world. Is this a recreational use?


No it is not recreational use. And no, they are not freely sharing it. It is use to build a monopoly, make hones competition impossible and plan charge as much possible on it.

It is the same playbook everytime. We dont have to be naive and pretend meta is doing something for other peoples benefit.


>And no, they are not freely sharing it

Are you unable to access this page?

https://www.llama.com/llama-downloads/

Or this one?

https://lmstudio.ai/models/meta/llama-3.3-70b

>It is use to build a monopoly

How?

>We dont have to be naive and pretend meta is doing something for other peoples benefit.

Meta benefits from the current war of open model competition, but we also benefit from it. In particular, participating in all this makes it hard for them to pull the ladder up when the market changes. They will have to justify why whatever new hotness is better than these existing models already on our hard drives.


Tell me more about these methamphetamine products. Inquiring minds would like to know!

It would be disingenuous framing because the argument against copyright stems from a belief that information should be free. Meta does not do things in this spirit. There's no about face needed...

> It would be disingenuous framing because the argument against copyright stems from a belief that information should be free. Meta does not do things in this spirit.

Don't they? They release the llama model weights, they do things like this:

https://www.opencompute.org/wiki/Open_Rack/SpecsAndDesigns

They also make significant contributions to Linux and are the originators of popular open source projects like zstd and React.

They make their money from selling ads, not selling licenses.


They only released the weights because someone leaked them.

Someone leaked the llama 1 weights before they were released. That doesn't explain why they would release the subsequent versions except that they wanted to.

Speaking of ai and meth, have you seen videos of the palantir CEO Alex karp? Dude looks like he's regularly getting the same meth shots Hitler used to get.

But I hear you. One of my biggest tells that someone can't be reasoned with is when they resort to whataboutism without any consideration for how 2 situations can actually be different even if there is some commonality. It's a powerful bad faith argument technique. When that style of argument comes up I nod my head and walk away. Some people are just doomed.


[flagged]


I am not s copyright maximalist, but I would tell you be careful of a world where copyright and IP is meaningless. Might as well let any other country/company one shot your entire industry.

Slippery slope, false dilemma, etc. What other fallacies do you have in your utility belt, batman?

How did you know I was Bruce wayne?

Where's my goddamn electric car Bruce?

BYD sells them for super cheap.

I also find it funny, I said this regarding the other thread and article[0]

'"They then copied those stolen fruits"

How are these fruits "stolen" if they still have what was allegedley stolen?

Dowling v. United States, 473 U.S. 207 (1985): The Supreme Court ruled that the unauthorized sale of phonorecords of copyrighted musical compositions does not constitute "stolen, converted or taken by fraud" goods under the National Stolen Property Act

And even if, arguendo, sure its stolen. The purpose of copyright is to "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries"

And you would be hard pressed to prove that LLM's haven't advanced the arts and sciences, so at bare minimum transformative, ie fair use.'

[0] https://news.ycombinator.com/item?id=48026207#48029072


>How are these fruits "stolen" if they still have what was allegedley stolen?

If you write a book and I take it and embed its knowledge into my product that is so pervasive that no one needs to buy your book any more (and I don't even credit you so no one knows where that knowledge came from), to you really still have what was stolen? And I didn't even buy a copy of your book to copy it.


> If you write a book and I take it and embed its knowledge into my product that is so pervasive that no one needs to buy your book any more (and I don't even credit you so no one knows where that knowledge came from), to you really still have what was stolen?

The trouble with this analogy is that it proves too much.

Suppose you write a book, and so does someone else, but they have better marketing than you and then people in the market for that genre buy theirs instead of yours. Let's even stipulate that the existence of their book actually lowers your sales, because people who want that kind of book already bought theirs by the time they find out about yours and then some people don't have time to read or can't afford to buy both.

Notice that we haven't yet said a word about the contents of either book. They could be completely independent and they've never even heard of you or your book -- they "didn't even buy a copy of your book to copy it". All we know is that they're the same genre and the existence of theirs is costing you sales. By that logic all competition would thereby be "stealing", and that can't be right.

Which implies that you don't have a property right to the customers.


A better analogy would be that you do original research or work and produce a valuable book. Somebody else looks at your work, decides it has value, and reproduces it in a new book under their name. The new book is cheaper, or easier to find, or for whatever reason displaces your original book created through your own research and investment. Now somebody else is profiting off your creativity or work, without payment or even acknowledgement.

I'm not sure how this plays out legally, but it certainly seems unethical


So for example, when Disney sees value in public domain stories like Cinderella, Rapunzel/Tangled or Snow White, and they make movies out of them, profiting from the creativity and work of the Brothers Grimm without paying anything to their estate, or high school plays do Shakespeare, that seems unethical to you?

Would it be fair for Greece to do retroactive term extensions all the way back to Plato and then sue anyone who copies the idea of having a university or uses the Platonic solids or distributes religious texts that incorporate the dualistic theory of the soul?


Your examples, as you say, are all public domain. Are all the works we train LLMs on public domain too? Was the original book in my analogy in the public domain? What do you think about training on material that isn't yet in the public domain?

You're framing this as an ethical question, but copyright term lengths are essentially arbitrary. They're set by the government, as are the boundaries of fair use. At which point you're making a circular argument. That it's bad if it's illegal and that it should be illegal because it's bad. So what happens if someone argues the opposite? That it's not unethical if it's fair use and then it should be fair use because it's not unethical.

I'm not making a circular argument, nor one based on legality. You explicitly changed your example to use "public domain" content, and ignoring the legal specifics of that it's clear that's a separate category of content. Most people have no ethical issue with remixing or using content that has already done the rounds and delivered most of its immediate value to the creator. This is very different to your earlier examples with books, framed as two contemporary pieces of media competing with each other.

Letting companies train LLMs on the "classics" is very different to training on contemporary media where the creator still depends on it.


I like your argument, not because it is a good analogy for AI but because it is a good contrast. Copyright isn't a guarantee or magic force field blocking fair competition. It is a permeable buffer against lazy knockoffs and time-boxed so that buffer doesn't choke all future creativity.

People on this thread need to focus on what "derivative" and "fair use" mean and understand both are measured on a somewhat fuzzy spectrum, subject to interpretation.

In a perfectly fair world AIs/MLs could vacuum up all human knowledge, fair and square. (In an ideal world, they would do that adhering to polite opt-in/opt-out agreements with copyright holders. We can dream). Input isn't theft.

On output, two magic genies would stand at the gate, the Derivative Genie and Fair Use Genie and review anything spat out by the AI/ML. If it crossed agreed upon thresholds the Genies would bar the gates and issue a stern warning to prompt again (or maybe the AL/ML would auto-adjust the prompt and try again).

So, if your prompt asked for a 300-word poem about thrash metal mosh pit dancing and it spat out a poem where 85% of it match one of the handful of available mosh pit poems in its database, the Derivative Demon would block the output and raise an alarm.

On the other hand, if you asked for a line by line analysis of a famous mosh pit dancing poem (by name) or maybe asked for a satirical spoof of said poem, the Fair Use Demon would overrule the Derivative Demon and give the output a pass.

That's as fair as this could get, especially if you add one more thing: An Appeals Court (maybe corporate, maybe 3rd party, maybe state run) with a Settlement Pool. If a copyright holder could prove the Genies let pass something they shouldn't, the AL/ML would fix that. If real damage is done, the creator would get a settlement from the pool.

The point is that the Input Genie is out of the bottle. Creators just look foolish trying to squeeze it back in. Better, they should focus on making the output Genies and the Appeals process as effective and fair as possible for everyone.


Why are you talking about this case that case nothing to do with the topic at hand? The comment you’re replying to gives a very clear and narrow analogy, and you’re talking about something else.

How is it something else? It's the same analogy. The problem with it is that the harm from the alleged theft doesn't require any use of the original material in order to happen, since that "harm" is competition rather than expropriation.

The attempt to distinguish them is through copying, but that's the part that isn't depriving anyone of anything.


The main point here is _using_ copyrighted materials to create a commercial product, that you then sell, that may be used as alternative or substitute for the original materials. You’re missing that point and talking about two independent projects competing.

Because the competition is the only source of alleged harm, but people can do that even if they don't copy anything. There isn't actually a property right to the customers. You can lose sales to someone else whether they copied anything or not.

So what that you can loose sales even without crimes being committed? This somehow makes it okay to profit off someone’s work and ignore licenses?

What if I read your book (and a bunch of other books), and use what I learned to write my own book? Have I "stolen" your book?

Facts are not copyrightable. Only your particular way of expressing those facts is copyrightable.


Yes. That's not to say that something damaging wasn't done, but nothing was stolen. Stealing/theft requires deprivation of property. It's like receiving a normal nonlethal punch in the face and calling it murder. Murder requires someone dying.

> Theft [...] is the act of taking another person's property or services without that person's permission or consent with the intent to deprive the rightful owner of it. --- https://en.wikipedia.org/wiki/Stealing


My God, I can't believe chodes are still playing this "how many angels can you fit on the head of a pin" navel gazing semantic argument. Thirty years at least, it was all you saw on fin de ciecle Slashdot from anyone with a six-digit UID. No one cares about your hyper literalist meaning of "theft," that's not the goddamn point. Christ, this place looks like Reddit more and more.

This isn't a court of law. We don't have to talk like lawyers. If you replaced "theft" with "copyright infringement" in the comment you had such a problem with, what meaningfully changes besides we all have about five additional brain cells?


Even the case for copyright infringement is weak. LLMs are not copying machines, we already have copying machines at much lower price, almost zero, and perfect fidelity and much faster than generating it probabilistically. So it makes no economic sense to spend billions on training and inference to make a copier. In fact the value of LLMs is where they do not copy but apply knowledge a new situation.

> If you replaced "theft" with "copyright infringement" in the comment you had such a problem with, what meaningfully changes besides we all have about five additional brain cells?

The obvious difference that copyright is subject to fair use and various other limitations that personal property isn't.


Ever hear of Aaron swartz?

Aaron Swartz was charged under the CFAA, which isn't even copyright law, and the prosecution was widely condemned as draconian overreach.

>> Stealing/theft requires deprivation of property

maybe you should look up the definition of property, which is a set of legally recognized rights over a thing, typically including:

* possession (what you're focusing on)

* use

* exclusion

* transfer

The last 3 seem like they have been breached, in legally that's theft.


Violation of these rights may be criminal without meeting the strict legal definition of theft.

This can even extend to stealing physical property.

Depending on local laws, stealing a car may not actually be theft if the defendent can prove they intended to return it before the owner got home from work, though it would certainly be considered theft in the colloquial sense of the term, and they would still be guilty of a lesser offense like civil and/or criminal conversion.


> Depending on local laws, stealing a car may not actually be theft if the defendent can prove they intended to return it before the owner got home from work

I doubt there's even one place where the law works like that.


> I doubt there's even one place where the law works like that.

In a lot of places, that's how it works. A key element of theft is the intent to permanently deprive someone of property.

This is why joyriding isn't classified as auto theft and is instead a lesser offense. It's because joyriding is an intent to temporarily deprive, while GTA is an intent to permanently deprive.

In some jxns (the UK is one), there is a tort called trespass to goods, and an example of this would be "stealing" someone's property to deliver to another location for them to use there. The tort of conversion is similar: interference with someone's property right to treat it as your own (silent as to length of time).


Yea in the us if someone tries to steal your car and you are in it or threatened by it you can shoot them dead or something like that (ianal) You may have a court day but in many situations no punishment will follow.

Theft is not the breach of any property right. It's specifically the deprivation of property without consent. Yes, I have checked the definition in my jurisdiction.

Getting punched in the face also violates rights, yet isn't murder. Murder is specifically about dying.


You forget that laws are made by people and at anytime they can change interpretations are arbitrary, roe vs wade today but not tomorrow.

People seem to think what ai is today is theft. If enough people agree, it will be theft. Big companies dont like this and push the other way. An objectiveness doesnt exist here. It is too wiggly


You’re splitting hairs over a definition that isn’t relevant here (theft and copyright infringement are different things) to defend something that even you agree is bad.

It isn't splitting hairs. The damages are completely different in nature.

With theft, the entire damage is the deprivation. It could be an heirloom or some other object that may have been entrusted to you, something that can never be replaced, memorabilia of loved ones. Something that you may have needed in your posession to survive (e.g. a car to go to your job).

With a given copyright violation, the damage is that maybe[1] you made less profit than you could have. The potential for profit is not property. Profit isn't guaranteed.

[1] The loss is not certain, because there's no guarantee that the ones consuming the copyrighted content could have even afforded it.


Cool cool cool. So all the code and data you send to anthropic and chatgpt should be mass distributable to forward other peoples arts and science? All your meeting notes with ai summarizers, slack chats with bots? Might as well put your entire company and all plans for it on github mit licensed. Ill take a peek, see if there's anything valuable to me in that. Don't worry you can keep it all on your github too. It's still yours afterall. Copilot will be training on it too though btw

That's a privacy violation, not relevant.

No it's not. You exposed that data to an LLM. Should have read the fine print. The laws around that don't make sense to me anymore so therefore I own that stuff now. That's how this works right? You do know chatgpt etc can read everything you write, right?

Also social media profile pics. Great way to get faces for deep fake ads. Most people are just 1 phone call away from being voice cloned. Our likeness isn't all that important either if you think about it.

Maybe meta will clone your writing style and sign into your meta account and message your friends telling them about this awesome new product. Meta owns the account and you uploaded data to it.


Literally none of these things are defensible positions, so nobody will take you seriously.

Many of the things I wrote are already happening. The others probably are but haven't been reported yet.

I think Anthorpic has pledged to not use team and enterprise user's data for training purposes. I don't mind if they do some verification or whatever as long as it doesn't end up in the responses it gives others.

I have an amazing timeshare for sale and you seem like someone who would really see the opportunity this provides. How are your financials?

What Silicon Valley company over a decade old has respected the limitations on using data that they agreed to? At least any valuable data.

yes yes and google pledged "don't be evil"

Don't be naïve. A corporation would tear the flesh from your body if it meant a better quarterly earnings report.


Having seen someone die at work, this is factual. The comments made during and after were eye opening.

You were swiftly corrected about your misunderstanding under your original comment. Reposting it here, removing the quote farther from its context, and hoping to not be downvoted again is very weird!

I don't see how me quoting the actual complaint the news was about, in both threads, was me being swiftly corrected. If you where to base it on upvotes then this one shows I'm right and you got swiftly corrected here. In both cases it was relevant as both threads where not yet merged and about the same complaint. And held two positons on front page and I was adding to the discourse.

>It's clear to me that AI training is transformative fair use under existing law.

I wouldn't even go that far. Its an entirely new product. Its like the guy who sold you the keyboard demanding royalties for the software you built.

That the person who wrote the book couldn't predict a new use case for the book in training LLMs, is irrelevant. The book isn't in the LLM. Its not being sold with the LLM. Its one of billions of tools used to create the LLM.

People try and sell this as the AI companies extracting value from the poor little IP holders like Disney. Its maddening. That content is your cultural heritage. It already belongs to you, just some idiot has been granted a lifetime of exclusive exploitation. An LLM is trained on data you already own. Disney et al wants to exploit the new technology to extract even more money out of stuff created often decades ago.

At absolute worst its reverse engineering, which was supposed to be fair use protected in the US but apparently that's been somewhat eroded.


> The book isn't in the LLM.

An LLM is essentially a lossy compression of the training data. The book absolutely is in there, it’s just mangled to the point of unrecognizability.


The wood tends to have an impression of the hammer that hits it. The book isn't in there, the weights are just shaped by what tools were used to form it.

When large quantities of source material are replicable by prompting its a bug not a feature.


That's just semantics. The wood would be there without the hammer, the LLM wouldn't be here without the copyrighted works it's based on.

No, thats just semantics.

>The LLM Wouldnt be here without the copyrighted works

Google wouldn't be here if it hadn't scraped every copyrighted website and used them to form a searchable graph of the internet but we only hear complaints about them when they reproduce entire news articles.


If my book isn’t in your LLM, then prove it and don’t use my book to train your LLM.

>don’t use my book to train your LLM.

What makes you think you are entitled to tell people what they can and cant do with data they purchased (or otherwise acquired) from you. Extremely honest question. I just cant put myself in your shoes.

Like if I had written anything useful I would be overwhelmingly flattered that my content be considered so worthy for inclusion.

Your profile suggests that you are a philosopher. Did you get into philosophy hoping to exploit the publishing industry to the extent that you can squeeze every cent out of your thoughts, and deny their potential uses downstream?

Its actually crazy how bad things are, I am usually keen on capitalism and exclusivity, but the whole thing with LLMs, I see people pushing hard to tighten the grip of intellectual property. I see people making 50 cents a month on Kindle Unlimited suddenly shocked that someones LLM generated output might be ever so slightly influenced by weights ever so slightly influenced by their work, seemingly thinking they might get some big payday out of it.

Give me a tiny little wedge of understanding of your thought process. Your book is right now, doing a greater social good on your behalf than me running around and removing all the trash from my neighborhood, and the benefits of that social good are going to accrue long after you and I are gone. Your work is now going to live on, in a very tiny way, in these systems forever. I am honestly envious.

If anything, I would be trying to get bad writing removed from LLM training data. Things that I dont want to influence others. But as a potentially honest promoter of your work, you want it removed?

Whats the number? If not 1:1 exactly what you charge for the book, what do you think the proper compensation for slightly influencing training weights you should receive?


> What makes you think you are entitled to tell people what they can and cant do with data they purchased

Hundreds of years of copyright law. I bought a copy of Windows, but I’m not allowed to modify that data with a cracker and sell a bootleg DVD of it.

I should edit to clarify that I’m not a big fan of Lars Ulrich or Disney, but I don’t think we’re going to get a win here for the recreational IP pirates. What’s more likely is that we’ll end up with some Frankenstein law that favors both Mikey Mouse and OpenAI, and you and I will neither get free movies nor the ability to earn a living off of our creative labor.


I mean, the comparable situation would be, being allowed to sell something you created on Windows.

But in abstract you should absolutely be able to modify and sell windows.


To continue your analogy, I had to pay for Windows before I was allowed to create something with it, or acquire a license for under terms they set forth. If AI companies stopped at the public domain, then my argument wouldn't really hold up, but they didn't do that. They acquired everyone's copyrighted works without regard for the license and now they're, in the most charitable interpretation, using them to create derivative works.

And before you give me an analogy about how someone could listen to Pink Floyd and then produce works inspired by their influence yada yada: Someone is a human being with human rights, and if we're going to start pretending that training an LLM is in any way analogous to human consumption and creativity, and not an industrial process that encodes input data into a digital artifact, then let's start by saying LLMs have human rights and cannot be owned by a company that charges for access to them.


>To continue your analogy, I had to pay for Windows before I was allowed to create something with it, or acquire a license for under terms they set forth.

Yep and so far it looks like the issue with the meta case is they didnt pay for the book. Not that they used it in training data.

>in the most charitable interpretation, using them to create derivative works.

Yeah in the same way I use a hammer to create a derivative table.

>Someone is a human being with human rights, and if we're going to start pretending that training an LLM is in any way analogous to human consumption and creativity.

I dont care about that. Its simply a tool being built using existing tools. Like using a jigsaw to make a step ladder.


> Yep and so far it looks like the issue with the meta case is they didnt pay for the book. Not that they used it in training data.

Let's not sane-wash what they did here, they didn't just 'forgot to pay for the books', they deliberately and illegally downloaded and used material that wasn't theirs to use.

If you or I did that, we would be jailed or sued into destitution. In a fair world we either should change copyright laws (allowing for anyone to freely pirate all media), or Zuckerberg needs to go to jail.


>Let's not sane-wash what they did here, they didn't just 'forgot to pay for the books', they deliberately and illegally downloaded and used material that wasn't theirs to use.

Yes. Forgot is your word.

But lets face it, there wouldn't be a case to answer for if they had paid retail for each book, torn them up and scanned them and trained on that data.

>Zuckerberg needs to go to jail.

I am comfortable with that but would prefer updating copyright.


A million dollars please.

It’s called a copyright notice. Same as a license. If you’re running a commercial business you can’t legally just take that piece of work and reuse it. Pick any book off your shelf and pretty well every one of them will have words to the effect of:

All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the publisher, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted by copyright law. For permission requests, write to the publisher, addressed "Attention: Permissions Coordinator," at the address below.

Same as every piece of commercial software has a license which has to be abided by. Same as use of Meta’s service has terms and conditions which HAVE to be agreed to.

So yeah they’re free to break that license but they’re also free to be sued by IP holders for breaking it at scale.


Well its not a solved issue in terms of law. But even still, I would have expected you to understand that I wasnt speaking legally.

Illegally obtaining copyrighted materials is usually the issue not the transformation part

Looking at the complaint ( https://publishers.org/wp-content/uploads/2026/05/2026-05-05... ), that seems like the part that's got the most solid foundation, especially given that while torrenting the books, they were also seeding to other peers.

The items they call out around training the models (and attempting to claim that each subsequent model generation should count as an additional instance of infringement) seem far less grounded in the current court interpretations of AI training.


Absorb all "our" IP without consent, in doing so remove "our" own source of revenue, and then repackage it as their own product. Not really fair use IMO.

How does that work? Is it a kind of infringement without substantial similarity?

I find it hard to think of a reasonable analogy. But it's like coming into your house, stealing all your belongings, and then building a new house with all your shit inside and then selling it back to you.

I think this completely misses the point... the point is that Meta pirated the media they used to train their model.

I am not a fan of US copyright law, but if I torrented millions of books, I would be facing a felony charge in criminal court and a (with statutory damages as high as $150,000 per title in cases of willful infringement) multi-billion dollar lawsuit in civil court.

In my opinion, this has nothing to do with whether or not AI training is transformative and this fair use, and everything to do with whether or not the laws apply to everyone equally. If Facebook isn't forced to pay billions and elect a sacrificial executive to serve prison time, then I will remain angry.


> It's clear to me that AI training is transformative fair use under existing law. Maybe this will be the case to prove it.

That is not what this case is about. It is more about the illegal violation and piracy of copyrighted content done by Meta for commercial use and Zuck knew they were doing it.

Why did Anthropic settle [0] with a multi-billion dollar payout to authors after commercializing their LLMs that was trained off of copyrighted content that was illegally obtained and kept without the authors permission?

There's a reason why they (Anthropic) did not want it to go to trial. (Anthropic knew they would lose and it would completely bankrupt them in the hundreds of billions.)

AI boosters will do anything to justify the mass piracy and illegal obtainment of copyrighted material for commercial use (not research) which that is not fair use in the US. There is no debate on this. [0]

[0] https://images.assettype.com/theleaflet/2025-09-27/mnuaifvw/...


I think copyright is far for being the most important aspect related to AI, it's geopolitical and economical. And even if it was the most important, there is only a case to be made for 1. that copy used to train models and 2. rare or induced regurgitation by targeted prompting.

The original work is not replicated identically, why would we replicate a work when it can be more easily seen in original or replaced with an alternative options online. We use AI to produce new outputs to new situations. We already have had drives and networking for plain copying.


If i could ask for a summary from an llm vs buy a book id go with the summary. That eats into commercial use and the supreme court case sided with Gerald Ford when a newspaper published a small gist of his autobiography because it ate into the sales

Every single Wikipedia article of a book or TV show has this summary. Ford should have lost.

Probably, Educational purposes is strong component of fair use doctrine

Yea nope. I like the full book without any loss of information. Even if I don't want to read the entire book. LLMs love to respond even when something is outside of their training set.

It's not settled law so I'm not sure how that's clear to you.

I think both Elsevier and the people that appropriate IP for training commercially deployed AIs purpose without the consent of the author(s) should be legal.

It actually depends on evilness of the company. Elsevier is just less evil that Zuckerberg and Meta, while publishers are even less problematic. I dont think there is anything funny in that.

Or anything to defend on Meta. If they go out of business, humanity profits.


Elsevier is shitty to people doing stuff that (imo) should be allowed. Meta is making money doing the same thing and not getting the same shittiness from Elsevier.

Elsevier at least works within the (admittedly broken) system, Meta does not.


When you use millions of copyrighted materials to bundle together to produce a commercial product, I wouldn’t call that a fair use. Especially when licensing of such material doesn’t explicitly allow that, the material wasn’t even purchased on consumer markets and your commercial product may be a competitor/analogue to the copyrighted material.

Not even going to all GPL stuff, that in a better world should have screwed all the slop companies


The enemy of my enemy, and all that.

I'm not on Elsevier's side, but I still think it's bullshit that giant companies are allowed to do things at a scale that I'd go to prison for.

That's always going to be true for the Capitalist class.

And yet I continue to rage against the dying of the light.

"Funny" is how dishonest snipes are framed. It such a common trope of internet quips, it's wearing me out. Can we please try to just format our disagreements without the snideness?

Such a garbage take. This is not a parody or a critique. Mark Zuckerberg is not Weird Al Yankovic.

A big part of the problem here is that Del Monte was the victim of several leveraged buyouts that had executives walking away with millions while the company was saddled with debt.

Exactly. That is what is missing in this discussion. If you want to cut down the trees, fine, but those people who profited should pay for it.

I always wonder where consumer surplus fits into arguments about profit.

Although in this particular situation clearly the consumer surplus wasn't enough to keep consumers buying Del Monte products.

https://en.wikipedia.org/wiki/Economic_surplus

If we measure consumer surplus as a percentage, how would it compare to business profits as a percentage?

Edit:

  Nobel laureate William Nordhaus studied the historical data of the U.S. economy and concluded that innovators and corporations capture only a tiny fraction of the total social value they create. Consumers capture ~98% of the value in the form of surplus. Producers capture ~2%.

I think that notion is mostly meaningless to actual humans.

I'm not sure I understand your point? If you are private equity and do a leveraged buyout, the company is priced as if you could extract the current value of the company out of the acquisition. As if the company were a consumable basically, because that's how you're going to pay off the loan. If consuming the company requires mistreating customers (getting rid of consumer surplus), then that's what's going to happen. The way you're talking about this sounds like the cause is a lack of consumer surplus when that's just a symptom of a leveraged buyout.

Also Nordhaus being a Sveriges Riksbank price laureate tells you how silly and meaningless the Sveriges Riksbank price in economics is. His work on climate change is so bad it's embarassing.


I'm trying to explore how we decide on root causes, and how many seem to want deserving victims to punish.

Is "those people who profited should pay for it" a desire to guillotine[1] those "executives walking away with millions".

Who profited? Do we blame the executives? Should we search for culprits of modern capitalist systems? How much is my fault or responsibility?

Sorry for the horrid quote - it was there to illustrate the question about consumer surplus - but it is too close to trolling.

> consuming the company requires mistreating customers (getting rid of consumer surplus)

I don't think you are using surplus meaningfully

Byrne Hobart[0] calls such acquisitions strip-mining of goodwill. Essentially extracting money from intangibles by destroying a brand. He uses brutally vivid metaphors, but with solid economics.

Yeah the Sveriges Riksbank prize seems ignoble.

[0] Byrne Hobart writes The Diff. Worthwhile subscribing to the free tier, although there is a lot of referencing to paid tier content. https://diff.substack.com/

[1] I've just read «A Tale of two cities» which uses the French revolution for English entertainment.


The problem isn't that the trees are in the wrong place. The problem is that there are more trees than demand for canned peaches. It's a failure of planning on the part of Del Monte and peach growers.

Covid boosted the sale of canned food, but people avoid the sugary syrup of canned fruits in non emergency situations.

They plant something else. There just isn't demand for canned peaches anymore, so this is exactly what should happen. It's just unfortunate that it had to happen all at once with this bankruptcy rather than in a more organized fashion that could have prevented these unneeded orchards from being planted in the first place.

I'm sorry, but this is completely wrong. California canning peach farmers are organized and crop prices are set by industry-wide bargaining with processors every year. Additionally, now that Del Monte is out of the business, the only remaining operating canneries are owned by a grower cooperative. It didn't save the industry. In fact, it may have led to the irrational planting of these trees that now need to be pulled. Source: my father was a peach farmer and chairman of the board of the California Canning Peach Association for many years. But he saw this coming and got out of the business.

I’m an agronomist and while I don’t directly deal with that level of things, what you wrote sounds roughly like what goes on for the hazelnut industry here in Oregon.

https://www.hazelnutbargaining.com/


He saw demand falling or what? What did he swap to?

He saw demand falling, exports falling due to the strong dollar and increased productivity in international farming, mismanagement at the canneries with executives cashing out using leveraged buyouts and saddling the companies with unsustainable debt, and trouble finding enough labor (peaches are harvested by hand, almost entirely by migrant workers from Mexico because no native Californian is willing to climb up and down ladders all day in 110 degree heat and 100% humidity, and it's hard to ensure legality).

He switched to almonds and walnuts, which are less labor intensive and have better management on the processing side. But they are an export-heavy market and have also been hammered by the strong dollar. Inflation-adjusted crop prices are near all time lows while costs are at all-time highs. Farming is a hard business!


Smart man! LBOs are such a plague we need better regulation.

Farming is hard. I heard Urea prices are up 2x since the start of the year. How many farmers will go out of business because of that…


Grok voice is surprisingly good, actually. It's still a dumber model than the thinking modes of frontier models, but it's less dumb than the voice modes of other providers.

Grok voice model is also a thinking model. I agree that it’s far better than the other voice models

Just give me a option to have a slower response but better model…


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: