Yes, i did that but its not apple silicon optimized so it was taking forever for...

spmurrayzzz · on May 1, 2025

You can just use llama.cpp instead (which is what ollama is using under the hood via bindings). Just need to make sure youre using commit `d3bd719` or newer. I normally use this with nvidia/cuda, but tested on my mbp and havent had any speed issues thus far.

Alternatively, LMStudio has MLX support you can use as well.