You're right to be skeptical. Without a way to actually implement how the human brain processes experiences into a consolidated memory, we won't be able to solve the long term memory problem at all. Not with the current technology.
An LLM context is a pretty well extended short term memory, and the trained network is a very nice comprehensive long term memory, but due to the way we currently train these networks, an LLM is just fundamentally not able to "move" these experiences to long term, like a human brain does (through sleep, among others).
Once we can teach a machine to experience something once, and remember it (preferably on a local model, because you wouldn't want a global memory to remember your information), we just cannot solve this problem.
I think this is probably the most interesting field of research right now. Actually understanding in depth how the brain learns, and figuring out a way to build a model that implements this. Because right now, with backtracking and weight adjustments, I just can't see us getting there.
I think if we want to build on what we have, instead of compaction at the end of the context window, the LLM would have to 'sleep', i.e. adjust its weights, then wake up with the last bits of the old context window in the new one, and have a 'feel' for what it did before through the change in weights. I just sense it's not that simple to get there, because simply updating the weights based on a single context sample risks degrading the weights of the whole network.
I like the idea of using small local model (or several) for tackling this problem, like low rank adaptation, but with current tech, I still have to piece this together or the small local models will forget old memories.
Sleep would probably be a part of the equation for consolidating , but there's still the question of how exactly does the brain process the information during sleep in a way that it permanently consolidates the information.
It's not how an llm can work right now, it needs too much iterations & a much bigger dataset than what we can work with. A single time experiencing something and we can remember it. That's orders of magnitude more efficient than an LLM right now can achieve.
Couldn't fitting solve the problem? That's what companies do: take a model as a base and train it on the specific data long enough so that it prefers the new data.
Overfitting may be a thing but for personal use, I may want to have it work as I expected, every time.
> I think this is probably the most interesting field of research right now. Actually understanding in depth how the brain learns, and figuring out a way to build a model that implements this.
This field of research has been around for decades, so who's to say when there'll be a breakthrough.
In fact, LLMs are great despite our very limited understanding, and not because we had some breakthrough about the human brain.
Exactly. It's been around so long and we still don't know how to mimic it.
The way an llm learns is a very interesting way of doing it, but it sure isn't what the brain is doing.
But it's indisputable.. We can get enormous results with this technique. It's just probably not the way forward for faster learning to remediate the issue of context loss.
Why does a language model have to be monolithic? I think retraining a model is expensive (relatively speaking). Is there some way to bolt on specialization?
It's kind of fascinating that everyone is trying to build a Chinese Room agent with stateless models, since we don't know how to produce a stateful model with continuous, incremental training.
It's like spontaneous implemention of thought experiments from yesteryear. I wonder if all this product-focused experimentation will accidentally impact philosophy of mind after all...
An LLM context is a pretty well extended short term memory, and the trained network is a very nice comprehensive long term memory, but due to the way we currently train these networks, an LLM is just fundamentally not able to "move" these experiences to long term, like a human brain does (through sleep, among others).
Once we can teach a machine to experience something once, and remember it (preferably on a local model, because you wouldn't want a global memory to remember your information), we just cannot solve this problem.
I think this is probably the most interesting field of research right now. Actually understanding in depth how the brain learns, and figuring out a way to build a model that implements this. Because right now, with backtracking and weight adjustments, I just can't see us getting there.