> hybrid Mamba/Gated linear attention layers, Do any large-scale architectures u... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		jgammell 30 days ago \| parent \| context \| favorite \| on: ML promises to be profoundly weird > hybrid Mamba/Gated linear attention layers, Do any large-scale architectures use mamba? I was under the impression that people don't use it yet due to lack of efficient implementations. > Training is also vastly more sophisticated Is it? In what ways?

joefourier 30 days ago [–]

Qwen3.5 uses Gated Delta Networks which is essentially Mamba 2 + Delta Rule. It’s quite hardware efficient.

> Is it? In what ways?

Just the reinforcement learning for reasoning, and then tool use for agents, could be its own topic.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact