Do any large-scale architectures use mamba? I was under the impression that people don't use it yet due to lack of efficient implementations.
> Training is also vastly more sophisticated
Is it? In what ways?
> Is it? In what ways?
Just the reinforcement learning for reasoning, and then tool use for agents, could be its own topic.
Do any large-scale architectures use mamba? I was under the impression that people don't use it yet due to lack of efficient implementations.
> Training is also vastly more sophisticated
Is it? In what ways?