It is kind of a fundamental risk of IMDS, the guest vms often need some metadata...

axelriet · 2026-04-03T04:25:33 1775190333

You could imagine hosting the metadata service somewhere else. After all there is nothing a node knows about a VM that the fabric doesn’t. And things like certificates comes from somewhere anyway, they are not on the node so that service is just cache.

cyberax · 2026-04-03T08:11:45 1775203905

Hosting IMDS on the host side is pretty much the only reasonable way to provide stability guarantees. It should still work even if the network is having issues.

That being said, IMDS on AWS is a dead simple key-value storage. A competent developer should be able to write it in a memory-safe language in a way that can't be easily exploited.

axelriet · 2026-04-03T10:35:05 1775212505

“No, there is another”—Yoda, The Empire Strikes Back :)

What you describe carries the risk that secrets end up in crash dumps and be exfiltrated.

Imagine an attacker owns the host to some extent and can do that. The data is then on disk first, then stored somewhere else.

You probably need per-tenant/per-VM encryption in your cache, since you can never protect against someone with elevated privileges from crashing or dumping your process, memory-safe or not.

Then someone can try to DoS you, etc.

Finally it’s not good practice to mix tenant’s secrets in hostile multi-tenancy environments, so you probably need a cache per VM in separate processes…

IMHO, an alternative is to keep the VM's private data inside the VMs, not on the host.

Then the real wtf is the unsecured HTTP endpoint, an open invitation for “explorations” of the host (or the SoC when they get there) on Azure.

eBPF+signing agent helps legitimate requests but does nothing against attacks on the server itself; say, you send broken requests hoping to hit a bug. It does not matter if they are signed or not.

This is a path to own the host, an unnecessary risk with too many moving parts.

Many VM escapes abuse a device driver, and I trust the kernel guys who write them a lot more than the people who write hostable web servers running inproc on the host.

Removing these was a subject of intense discussions (and pushbacks from the owning teams) but without leaking any secret I can tell you that a lot of people didn’t like the idea of a customer-facing web server on the nodes.

cyberax · 2026-04-03T17:32:23 1775237543

Of course, putting the metadata service into its own separate system is better. That's how Amazon does it with the modern AWS. A separate Nitro card handles all the networking and management.

But if you're within the classic hypervisor model, then it doesn't really matter that much. The attack surface of a simple plain HTTP key-value storage is negligible compared to all other privileged code that needs to run on the host.

Sure, each tenant needs to have its own instance of the metadata service, and it should be bound to listen on the tenant-specific interface. AWS also used to set the max TTL on these interface to 1, so the packets would be dropped by routers.

axelriet · 2026-04-06T20:59:31 1775509171

>negligible attack surface of a simple-plain HTTP…

…unless you use a general-purpose web server with its own set of challenges as far as policies and configuration. I’ll leave it there.

jmogly · 2026-04-03T05:24:27 1775193867

Ah yes great point, awesome article by the way —- thought provoking, shocking, really crazy stuff. Hopefully some good comes of it, godspeed.