Hacker Newsnew | past | comments | ask | show | jobs | submit | metalwhale's commentslogin

Hello everyone! I'm writing a blog about my experience on training a minimal DDPM and just want to share what I've learned so far. Feel free to read and discuss with me.


Thank you for sharing this! I have one question: Is there any plan to add support for local LLM / embeddings models?


"Right now the system only supports OpenAI as an embedding provider, but we plan to extend with local and OSS model support soon."

In the post you responded to


Haha I feel so dumb now. Thank you!


This question keeps popping up but I don't get it. Everyone and their dog has an OpenAI-compatible API. Why not just serve a local LLM and put api.openai.com 127.0.0.1 in your hosts file?

I mean why is that even a question? Is there some fundamental difference between the black box that is GPT-* and say, LLaMA, that I don't grok?


I think it cannot surpass SOTA in some LM evaluation sets, but please understand that achieving better results requires a very good training dataset, which not everyone can afford.

On the other hand, the main points of Zamba/Mamba are low latency, generation speed, and efficient memory usage. If this is true, LLMs could be much easier for everyone to use. All we need to do is wait for someone with a good training dataset to train a SOTA Mamba.


This is interesting.

How can you retrieve the latent representation of the candidate LLMs? Some models do not have open weights (such as GPT-4), which means AFAIK it is impossible to directly access the hidden latent space through their API.

Am I missing something?


We just initialize a random latent vector for each model, and then jointly train each of these unique latent vectors :)


Thank you. Can I ask a question?

Does this mean LLMs can generate text with empty context? How can LLMs choose the first token without any previous tokens? My understanding is that to compute logits for the next token, LLMs require input from all previous tokens. Am I correct?


Disclaimer: I'm not the author. Just want to share this awesome project.


Thank you so much for sharing this great repo! I have noticed that the source transformation notebook is not finished yet. How is it now?


Disclaimer: I'm not the author. Just interested in the article and want to share this awesome post.


But it's worth joining ((((:


Well, I will choose shuffle mode if this service is available (((:


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: