More

bmodel · on Aug 11, 2024

We haven't tested it with hashcat yet but plan on doing so. If you get to it before us please let us know how it works!

bmodel · on Aug 9, 2024

That understanding of the system is correct. To make it practical we've implemented a bunch of optimizations to minimize I/O cost. You can see how it performs on inference with BERT here: https://youtu.be/qsOBFQZtsFM?t=69.

The overheads are larger for training compared to inference, and we are implementing more optimizations to approach native performance.

semitones · on Aug 10, 2024

> to approach native performance.

The same way one "approaches the sun" when they take the stairs?

the8472 · on Aug 10, 2024

I guess there's non-negligible optimization potential, e.g. by doing hash-based caching. If the same data gets uploaded twice they can have the blob already sitting somewhere closer to the machine.

ruined · on Aug 10, 2024

yes. my building has stairs and i find them useful because usually i don't need to go to the sun

radarsat1 · on Aug 9, 2024

Aah ok thanks, that was my basic misunderstanding, my mind just jumped straight to my current training needs but for inference it makes a lot of sense. Thanks for the clarification.

ranger_danger · on Aug 10, 2024

Is DirectX support possible any time soon? This would be huge for Windows VMs on Linux...

zozbot234 · on Aug 10, 2024

You could use unofficial Windows drivers for virtio-gpu, that are specifically intended for VM use.

ranger_danger · on Aug 10, 2024

Indeed, it's just that it's not very feature-complete or stable yet.

bmodel · on Aug 9, 2024

Awesome, we'd love to chat! You can reach us at founders@thundercompute.com or join the discord https://discord.gg/nwuETS9jJK!

bmodel · on Aug 9, 2024

At the moment out tech is linux-only so it would not work with Blender.

Down the line, we could see this being used for batched render jobs (i.e. to replace a render farm).

comex · on Aug 9, 2024

Blender can run on Linux…

bmodel · on Aug 9, 2024

Oh nice, I didn't know that! In that case it might work, you could try running `tnr run ./blender` (replace the ./blender with how you'd launch blender from the CLI) to see what happens. We haven't tested it so I can't make promises about performance or stability :)

chmod775 · on Aug 10, 2024

Disclaimer: I only have a passing familiarity with Blender, so I might be wrong on some counts.

I think you'd want to run the blender GUI locally and only call out to a headless rendering server ("render farm") that uses your service under the hood to get the actual render.

This separation is already something blender supports, and you could for instance use Blender on Windows despite your render farm using Linux servers.

Cloud rendering is adjacent to what you're offering, and it should be trivial for you to expand into that space by just figuring out the setup and preparing a guide for users wishing to do that with your service.

bmodel · on Aug 9, 2024

We have tested this with pytorch and huggingface and it is mostly stable (we know there are issues with pycuda and jax). In theory this should work with any libraries, however we're still actively developing this so bugs will show up

bmodel · on Aug 9, 2024

We're still in our beta so it's entirely free for now (we can't promise a bug-free experience)! You have to make an account but it won't require payment details.

Down the line we want to move to a pay-as-you-go model.

bmodel · on Aug 9, 2024

We haven't tested with MIG or vGPU, but I think it would work since it's essentially physically partitioning the GPU.

One of our main goals for the near future is to allow GPU sharing. This would be better than MIG or vGPU since we'd allow users to use the entire GPU memory instead of restricting them to a fraction.

tptacek · on Aug 9, 2024

We had a hell of a time dealing with the licensing issues and ultimately just gave up and give people whole GPUs.

What are you doing to reset the GPU to clean state after a run? It's surprisingly complicated to do this securely (we're writing up a back-to-back sequence of audits we did with Atredis and Tetrel; should be publishing in a month or two).

bmodel · on Aug 9, 2024

We kill the process to reset the GPU. Since we only store GPU state that's the only clean up we need to do

tptacek · on Aug 9, 2024

Hm. Ok. Well, this is all very cool! Congrats on shipping.

azinman2 · on Aug 10, 2024

Won’t the VRAM still contain old bits?

bmodel · on Aug 9, 2024

Exactly! The finer grain sharing is one of the key things on our radar right now

goku-goku · on Aug 9, 2024

www.juicelabs.co does all this today, including the GPU sharing and fractionalization.

ranger_danger · on Aug 10, 2024

the free community version has been discontinued, and also doesn't support a linux client with non-CUDA graphics, regardless of the server OS, which is a non-starter for me

bmodel · on Aug 9, 2024

Great point, there are a few benefits:

1. If you're actively developing and need a GPU then you typically would be paying the entire time the instance is running. Using Thunder means you only pay for the GPU while actively using it. Essentially, if you are running CPU only code you would not be paying for any GPU time. The alterative for this is to manually turn the instance on and off which can be annoying.

2. This allows you to easily scale the type and number of GPUs you're using. For example, say you want to do development on a cheap T4 instance and run a full DL training job on a set of 8 A100. Instead of needing to swap instances and setup everything again, you can just run a command and then start running on the more powerful GPUs.

doctorpangloss · on Aug 9, 2024

Okay, but your GPUs are in ECS. Don't I just want this feature from Amazon, not you, and natively via Nitro? Or even Google has TPU attachments.

> 1. If you're actively developing and need a GPU [for fractional amounts of time]...

Why would I need a GPU for a short amount of time during development? For testing?

I don't get it - what would testing an H100 over a TCP connection tell me? It's like, yeah, I can do that, but it doesn't represent an environment I am going to use for real. Nobody runs applications to GPUs on buses virtualized over TCP connections, so what exactly would I be validating?

bmodel · on Aug 9, 2024

I don't believe Nitro would allow you to access a GPU that's not directly connected to the CPU that the VM is running on. So swapping between GPU type or scaling to multiple GPUs is still a problem.

From the developer perspective, you wouldn't know that the H100 is across a network. The experience will be as if your computer is directly attached to an H100. The benefit here is that if you're not actively using the H100 (such as when you're setting up the instance or after the training job completes) you are not paying for the H100.

doctorpangloss · on Aug 9, 2024

Okay, a mock H100 object would also save me money. I could pretend a 3090 is an A100. “The experience would be that a 3090 is an A100.” Apples to oranges comparison? It’s using a GPU attached to the machine versus a GPU that crosses a VPC boundary. Do you see what I am saying?

I would never run a training job on a GPU virtualized over TCP connection. I would never run a training job that requires 80GB of VRAM on a 24GB VRAM device.

Whom is this for? Who needs to save kopecks on a single GPU who needs H100s?

teaearlgraycold · on Aug 9, 2024

I develop GPU accelerated web apps in an EC2 instance with a remote VSCode session. A lot of the time I’m just doing web dev and don’t need a GPU. I can save thousands per month by switching to this.

amelius · on Aug 10, 2024

Sounds like you can save thousands by just buying a simple GPU card.

teaearlgraycold · on Aug 10, 2024

Well, for the time being I'm really just burning AWS credits. But you're right! I do however like that my dev machine is the exact same instance type in the same AWS region as my production instances. If I built an equivalent machine it would have different performance characteristics. Often times the AWS VMs have weird behavior that I would otherwise be caught off guard with when deploying to the cloud for the first time.

bmodel · on Aug 9, 2024

Thank you!

> is this a remote nvapi

Essentially yes! Just to be clear, this covers the entire GPU not just the NVAPI (i.e. all of cuda). This functions like you have the physical card directly plugged into the machine.

Right now we don't support vulkan or opengl since we're mostly focusing on AI workloads, however we plan to support these in the future (especially if there is interest!)

billconan · on Aug 9, 2024

sorry, I didn't mean nvapi, I meant rmapi.

I bet you saw this https://github.com/mikex86/LibreCuda

they implemented the cuda driver by calling into rmapi.

My understanding is if there is a remote rmapi, other user mode drivers should work out of the box?