Hacker Newsnew | past | comments | ask | show | jobs | submit | kflansburg's commentslogin

I would recommend tracking this data over time. I work on Cloudflare's KV cache for Kimi K2.6, and while there are periods where our cache rate is low, we are frequently in the 80-90% range. OpenRouter shows us at 87.3% at the time of this post. We observe cache rates change quite a bit from hour to hour.

True for Kimi, but the results I published are average across the models (CF has over 10 models on openrouter). Your current Kimi K2.6 is over 80% but Gemma 4 26B A4B is 0%. https://openrouter.ai/google/gemma-4-26b-a4b-it

This is also the reason providers like Anthropic scored lower because while Opus 4.7 is close to 90%, Opus 4.5 is 45%


If you aren't already aware, Karpathy has several videos that could get you there in a few hours https://www.youtube.com/@AndrejKarpathy

very thanks!

Also check out his nanochat repo. I used the repo, claude and shadeform to train my own mini model for about $300. Would have been less but I screwed up and let the cloud gpu rental run for a few hours even though the training run errored out.

Of course the model was dumber than GPT2 but still it was a great learning experience.


Cloudflare | Systems or ML Engineer, Workers AI | Austin, TX or London, UK or San Francisco, CA (Hybrid) | Full-Time | https://cloudflare.com

Cloudflare is building across the entire AI stack. Here are some exciting things that we have launched recently:

- Kimi k2.5: https://blog.cloudflare.com/workers-ai-large-models/

- Kimi Performance Improvements: https://blog.cloudflare.com/high-performance-llms/

- Kimi k2.6 Speculative Decoding and Shared KV Cache: https://x.com/kevin_flansburg/status/2050238819065299110

- Unweight Tensor Compression: https://blog.cloudflare.com/unweight-tensor-compression/

- Code Mode MCP Servers: https://blog.cloudflare.com/code-mode/

- Agents SDK: https://blog.cloudflare.com/building-agents-with-openai-and-...

- Agent Memory: https://blog.cloudflare.com/introducing-agent-memory/

- Internal AI Tooling: https://blog.cloudflare.com/internal-ai-engineering-stack/

- Release Cog v0.19: https://github.com/replicate/cog/releases/tag/v0.19.0

- Dynamic Worker Sandboxes: https://blog.cloudflare.com/dynamic-workers/

- Dynamic Workflows: https://blog.cloudflare.com/dynamic-workflows/

We are looking for systems and ML engineers to help build our edge inference platform:

- Senior Systems Engineer, Workers AI - https://job-boards.greenhouse.io/cloudflare/jobs/7764827?gh_...

- Senior / Principal Machine Learning Engineer, Workers AI- https://job-boards.greenhouse.io/cloudflare/jobs/6297179?gh_...


Cloudflare | Systems or ML Engineer, Workers AI | Austin, TX or London, UK or San Francisco, CA (Hybrid) | Full-Time | https://cloudflare.com

Cloudflare is building across the entire AI stack. Here are some exciting things that we have launched recently:

- Release Cog v0.17: https://github.com/replicate/cog/releases/tag/v0.17.0

- Dynamic Worker Sandboxes: https://blog.cloudflare.com/dynamic-workers/

- Kimi K2.5: https://blog.cloudflare.com/workers-ai-large-models/

- Code Mode MCP Servers: https://blog.cloudflare.com/code-mode/

- Agents SDK: https://blog.cloudflare.com/building-agents-with-openai-and-...

We are looking for systems and ML engineers to help build our edge inference platform:

- Senior Systems Engineer, Workers AI - https://job-boards.greenhouse.io/cloudflare/jobs/7764827?gh_...

- Senior / Principal Machine Learning Engineer, Workers AI- https://job-boards.greenhouse.io/cloudflare/jobs/6297179?gh_...


> an if let expression over an RWLock assumed (reasonably, but incorrectly) in its else branch that the lock had been released. Instant and virulently contagious deadlock.

I believe this behavior is changing in the 2024 edition: https://doc.rust-lang.org/edition-guide/rust-2024/temporary-...


> I believe this behavior is changing

Past tense, the 2024 edition stabilized in (and has been the default edition for `cargo new` since) Rust 1.85.


Yes, I've already performed the upgrade for my projects, but since they hit this bug, I'm guessing they haven't.


They may have upgraded by now, their source links to a thread from a year ago, prior to the 2024 edition, which may be when they encountered that particular bug.


I see now that this incident happened in September 2024 as well.


I'd love to see this comparison between Garmin and Oura. I ditched Whoop because it was too easy to get 100 sleep scores, but my Garmin watch is much harder to please. I do think it offers better signal, but I've never scored over a 90, so maybe it is too critical?


I had over 90 scores, and only when I get earlier in bed and if there are medications involved. you do feel like you had a great sleep so for me it's very accurate.


I like garmin, and have to keep working to get a good sleep score.

that said, when I do get very good or very poor sleep, the score reflects it.


I've found Garmin's sleep tracking to be unreliable. For instance, if I lie in bed reading before I fall asleep, it often incorrectly logs that time as sleep.


All of the JS features are available in Rust, but some don’t have a first-class SDK API yet and you must use wasm-bindgen.


I believe that your summary misunderstands how we will handle versioning. The pyodide /package versions will be controlled by the compatibility date, and we will be able to support multiple in production at once. For packages like langchain (or numpy as you mentioned) the plan is to update quite frequently.

Could you expand on why you believe V8 will be a limiting factor? It is quite a powerful Wasm runtime, and most of the optimizations we have planned don’t really depend on the underlying engine.

Edit: Also just want to clarify that this is not a POC, it is a Beta that we will continue improving on and eventually GA.


> pyodide /package versions will be controlled by the compatibility date

That's exactly the issue that I'm mentioning. Ideally you should be able to pin any Python version that you want to use in your app: 2.7, 3.8 or 3.9 regardless of a Workerd compatibility date. Some packages might work in Python 3.11 but not in 3.12, for example.

Unfortunately, Python doesn't have the full transpiler architecture that JS ecosystem has, and thus "packaging" Python applications into different "compatibility" bundles will prove much more challenging (webpack factor).

> Could you expand on why you believe V8 will be a limiting factor?

Sure thing! I think we probably all agree that V8 is a fantastic runtime. However, the tradeoffs that make V8 great for a browser use case, makes the runtime more challenging for Edge environments (where servers can do more specialized workloads on trusted environments).

Namely, those are:

  * Cold starts: V8 Isolates are a bit heavy to initialize. On it's current form it can add up from ~2-5ms in startup just by initializing an Isolate
  * Snapshots can be quite heavy to save and restore
  * Not architected with the Edge use case in mind: there are many tricks that you can do if you skip the JS middleware and go all in into a Wasm runtime, that are hard to do with the current V8/Workerd architecture.
In any case, I would love to be proven wrong on the long term and I cheer for <100ms cold starts when running Python in Cloudflare Workers. Keep up the good work!


We discussed a separate configuration field for Python version. It’s not technically challenging, this was a design choice we made to simplify configuration for users and encourage more efficiencies in terms of shared dependencies.

Your concerns about V8 would impact JavaScript Workers as well and do not match what we see in production. It is also definitely possible to invoke C++ host functions directly from Wasm with V8.


> Your concerns about V8 would impact JavaScript Workers as well and do not match what we see in production

Interesting! I thought V8 snapshots were mainly used in the Pyodide context, as I could not find any other usage in WorkerD (other than promise tagging and jsg::MemoryTracker).

Are you using V8 snapshots as well for improving cold starts in JS applications?


I was responding to your point about isolates and cold starts. Snapshots are unique to Python, but V8 does not seem relevant here, all this is doing is initializing the linear buffer that backs Wasm memory for a particular instance. We have a lot of ideas here, some of which are mentioned in the blog post.


Awesome. Eager to see how the product evolves :)


(Cloudflare Workers tech lead here.)

I disagree about V8 not being optimized for edge environments. The needs of a browser are actually very much aligned with needs of edge, namely secure sandboxing, extremely fast startup, and an extreme commitment to backwards compatibility (important so that all apps can always run on a single runtime version).

Additionally, V8 is just much better at running JavaScript than you can hope to achieve in a Wasm-based JS implementation. And JavaScript is the most popular web development language (even server-side).

> On it's current form it can add up from ~2-5ms in startup just by initializing an Isolate

So, you and I seemingly have a disagreement on what "cold start" means. Wasmer advertises its own "cold start" time to be 50ns. This is only remotely possible if the application is already loaded in memory and ready to go before the request arrives. In my mind, this is not a "cold start". If the application is already loaded, then it's a "warm start". I haven't spent the time to benchmark our warm start time (TBH I'm a little unclear on what, exactly, is counted in this measurement), but if the app is already loaded, we can complete whole requests in a matter of microseconds, so the 5ms number isn't the correct comparison.

To me, "cold start" time is the time to load an application, without prior knowledge of what application will be needed. That means it includes the time to fetch the application code from storage. For a small application, we get around 5ms.

Note that the time to initialize an isolate isn't actually on the critical path to cold start, since we can pre-initialize isolates and have them ready to go before knowing what application they will run. That said, we haven't implemented this optimization historically, since the benefit would be relatively small.

However, with Pyodide this changes a bit. We can pre-initialize Pyodide isolates, before we know which Python app needs to run. Again, this isn't implemented yet, but we expect the benefits to be much larger than with plain JS isolates, so we plan to do so.

> Ideally you should be able to pin any Python version that you want to use in your app:

Minimizing application size is really essential to making edge compute inexpensive -- to run every one of two million developers' applications in every of our hundreds of locations at a reasonable price, we need to be able to run thousands of apps simultaneously on each machine. If each one bundles its entire language runtime, that's not gonna fit. That does mean that many applications have to agree to use the same versions of common runtime libraries, so that they can share the same copies of that code. The goal is to keep most updates to Pyodide backwards-compatible so that we can just keep everyone on the latest version. When incompatible changes must be made, we'll have to load multiple versions per machine, but that's still better than one copy per app.


Hey Kenton, great to see you chiming in here as well!

> Additionally, V8 is just much better at running JavaScript than you can hope to achieve in a Wasm-based JS implementation. And JavaScript is the most popular web development language (even server-side).

I agree with this statement as of today. Stay tuned because very cool things are coming on Wasm land (Spidermonkey will soon support JITted workloads inside of Wasm, bringing the speed much closer to V8!)

> Note that the time to initialize an isolate isn't actually on the critical path to cold start, since we can pre-initialize isolates and have them ready to go before knowing what application they will run

That's a good point. Although, you are kind of optimizing now the critical path to cold start by actually knowing what the app is running (if is Python, restore it from a Snapshot). So even though if isolate initialization is not in the critical path, there are other things on the critical path that amounts for the extra second of latency in cold starts for Python, I would assume.

> Minimizing application size is really essential to making edge compute inexpensive

By leveraging on proper-defined dependencies, you just need to compile and load in memory the dependency module once (lets say Python) and have "infinite" capacity for initializing them. Basically, if you put Python out of the picture and consider it a dependency of an app, then you can suddenly scale apps as much as you want there!

For example: having 10 Python versions (running thousands of apps) will have a overhead of 5Mb (Python binary size in avg) * 10 versions (plus a custom memory for each initialization of the app, which is required in either strategy) ~= 50Mb, so the overhead of pinning a specific Python version should be truly minimal on the server (at least when fully leveraging on a Wasm runtime)


Are people maintaining wasi ports of Python 2.7 and 3.8?



You can find an IV that makes sense for a single option with invalid other parameters, but things will break down when you go to price other expirations / strikes.

When trading, you don't want to wait to see an "updated" IV, you would want to respond directly to changes in important and well understood parameters like underlying price.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: