> It hallucinates a lot more then Sonnet or even MiniMax M2.5.
Ugh, that's not good.
I evaluated Kimi K2 a while back for some text understanding -> summarisation tasks, and of the 100 tasks it hallucinated about 30% of the output. :( :( :(
I guess that it was Kimi K2-Instruct, the first model (or it's fine-tune) in the lineup of Kimi-K2 models. And I remember trying it just for the sake of curiosity, and... except for the almost total absence of the sycophancy and "sugar syrup" in it's outputs, it was not very good at the time. Right now though, if you're still interested in this model family, you could look at Kimi-K2.5 which is way better.
That said, it's still not perfect, and to be honest, looking where things are going with LLMs right now I prefer the use of my own brain (local private inference with power consumption of ~20-25W, having a capability for continuous learning and performing real-world tasks) to the use of any "AI" model (including proprietary models such as Claude 4.6 Opus, Gemini 3.1 Pro and others).
Ugh, that's not good.
I evaluated Kimi K2 a while back for some text understanding -> summarisation tasks, and of the 100 tasks it hallucinated about 30% of the output. :( :( :(