KGPU - Augmenting Linux with the CUDA GPU

gnu8 · on Dec 16, 2012

Why is this cuda and not OpenCL? There's no reason to legitimize nvidia's proprietary nonsense, it just enables their bad behavior.

fool · on Dec 16, 2012

The GPU code seems to be relatively isolated to memory operations in gpuops.cu. Not altogether sure but a quick review suggests such mapping is supported by OpenCL. So one could rewrite that module and the whole thing would work without CUDA. Of course, in terms of compilers it is going to be a while before users can move away from nvcc for nvidia graphics card support.

However, as with the parent I'd really like to see a generic OpenCL vectorization kernel module. Strong suspicion that this work was directly or indirectly underwritten by Nvidia so I guess someone (Intel?) needs to step up and fund similar academic projects.

fool · on Dec 16, 2012

Suspicion confirmed:

"KGPU is a project of the Flux Research Group at the University of Utah. It is supported by NVIDIA through a graduate fellowship awarded to Weibin Sun."

http://code.google.com/p/kgpu/

dfc · on Dec 16, 2012

It really makes you wonder why AMD does not do the same thing; the hw is essentially free and how much could a fellowship cost. Does anyone know if research grants like this are a tax write-off?

iso-8859-1 · on Dec 16, 2012

CUDA is older and more popular: http://www.google.com/trends/explore#cat=0-5&q=opencl%2C...

snogglethorpe · on Dec 17, 2012

Nonetheless, it's a bad idea to favor proprietary languages that lock you into a particular company's products.

CUDA should have died when OpenCL came about.

Of course since such lockin is to Nvidia's benefit, it's understandable why they keep promoting their proprietary solution...

DannyBee · on Dec 17, 2012

Like what, you mean instantly? What if OpenCL sucked? Should CUDA still die?

hyperbovine · on Dec 17, 2012

What sort of vitriol is this? How dare they invent a new technology and build an API to it.

DannyBee · on Dec 17, 2012

Didn't you know all good API's are designed from scratch in open committees, instead of being standardized after multiple competing implementations?

greggman · on Dec 17, 2012

You do NOT want to do this. Unlike the CPU which can be preempted current GPUs can not. If you give them 2-30 minutes of work to do they will not return until that work is done. Windows gets around this by resetting the GPU if it doesn't respond for more than a few seconds. Linux/osx no such luck, at least not yet.

But, once reset the state of the GPU is often unknown. Not a good thing if you are embedding GPU code into your kernel.

adamgravitis · on Dec 16, 2012

Where is this useful? The bus speed across to the GPU is so slow, I thought it was only meaningful for near-autonomous operations.

Tuna-Fish · on Dec 16, 2012

Right now, the bandwidth to modern GPUs is actually pretty decent (16GB/s bidirectional), but the latency is still horrid. This means that you need rather large operations for offloading to pay off. I think doing raid-5 or full disk encryption with large blocks might just barely be worth it.

However, with AMD and Intel integrated GPUs, this is about to change. AMD is doing a lot of work on HSA, which can be summarized as "GPU and CPU share same memory, and can communicate by passing pointers". I can see this kind of work being really useful in the near future.

dfc · on Dec 16, 2012

More and more CPUs have AES instructions and my old lenovo ideapad has a crypto coprocessor. Do you think the GPU offload will be worth it when the sytem has hw accelerated crypto?

tjoff · on Dec 16, 2012

The question should be whether dedicated hardware for accelerated crypto will be worth it when GPU offload is suitable for it.

Although in that particular context (security) you might enjoy the isolation of dedicated hardware as opposed to sharing it with others. The GPU solution though of course has the advantage of being able to adapt to new ciphers etc.

Tuna-Fish · on Dec 16, 2012

Specialized, single-purpose hardware is right now some 10x more energy-efficient for the same task than a GPU, (and some 50x more efficient than a CPU). Given that modern chips are not limited by transistor density but by energy density, we're going to see more special-purpose hardware in our chips, not less.

dfc · on Dec 16, 2012

Adapting/implementing the newest and hottest cipher on the block is not something that the crypto community advocates. Do you really think crypto accelerated hardware is going to fall behind and not support the ciphers that the crypto community (academia/industry) endorses?

tjoff · on Dec 17, 2012

I've been waiting for what, a decade, since VIA introduced padlock to get hardware accelerated encryption in mainstream CPUs. And, just recently, basic support have been introduced but only for AES and nothing else (and if I'm not mistaken (probably is) the padlock is vastly superior to the offerings of AMD and intel :P).

So yes, crypto accelerated hardware is behind and does not support the ciphers that the crypto community (academia/industry) endorses and in all likelihood will never bother catch up since doing it on the GPU will be good enough. Even if it takes another decade.

dfc · on Dec 17, 2012

What algos are you missing?

The AES-NI instruction set was proposed in 2008 and the first intel cpus started shipping almost three years ago.[1] Soekris has had the vpnXXXX crypto accelerators since as long as I can remember.[2]

[1] http://ark.intel.com/search/advanced/?s=t&AESTech=true [2] http://soekris.com/

tjoff · on Dec 17, 2012

Blowfish or twofish wouldn't hurt. But I'd be happy with AES, too bad none of my devices have hardware acceleration for it.

The fact that it was proposed in 2008 is quite telling by itself. And when intel introduced it it was in their high-end product lines, to find a processor where AES-NI is less needed would be a challenge.

A suitable integrated GPU would penetrate the market much better and ultimately support products which today use the atom processor. The very same product segment where you barely can use encryption on today (in contrast to a i7 which saturates a fast ssd without breaking a sweat in AES-encryption throughput - without hardware acceleration).

A similar solution would also most likely allow me to encrypt files on my phone without a large impact, even if the manufacturer couldn't care less about security features.

Hoff · on Dec 16, 2012

That depends on which team you're playing for, no?

dfc · on Dec 16, 2012

I am not trying to be difficult but i have no idea what you are talking about. I looked through your past comments and you seem to be a competent commenter, can you clarify that you meant?

Hoff · on Dec 20, 2012

Whether you are attacking the encryption, or whether you're a user of encryption; a defender.

If you're attacking, having flexibility is advantageous.

patrickgzill · on Dec 16, 2012

I had thought in the past that storing the index of a database (not the data, just index) on the card and using that to handle complex queries, might be interesting. Not sure if that has a practical, real-world use though.

caf · on Dec 17, 2012

Not so much RAID5, which is just an XOR operation that is as good as free on a modern CPU, but RAID6 where a more computation-intensive Reed-Solomon code is used.

dkhenry · on Dec 16, 2012

One situation where you might see immediate results is in High volume routing. I think this project [1]. They were using the GPU to saturate multiple 10GbE interfaces with a commodity processor.

1.http://shader.kaist.edu/packetshader/

rincebrain · on Dec 17, 2012

I actually spoke to them about doing a similar thing, and they said they were using a GPU because they were pushing lots of sub-1500 MTU packets, and that commodity h/w could probably saturate multiple 10GbE interfaces that were just doing large packets for high-throughput.

jws · on Dec 16, 2012

README suggests RAID processing, file system encryption, and AES.

GIFtheory · on Dec 16, 2012

Agreed. I'm also struggling to see what kind of massively parallel operations need to be done in kernel space in the first place.

jws · on Dec 16, 2012

Maybe it doesn't have to saturate the GPU to be a win. If you can just banish some cache busting, streaming work, like raid processing, to a tiny sliver of the GPU it could be a win.

mtgx · on Dec 16, 2012

Would this be possible with OpenCL, too?

odranoelson · on Dec 16, 2012

I wonder whether this won't conflict with other user applications using the GPU.

wavesum · on Dec 17, 2012

I don't understand :)

Could someone explain what this kind of technology means in practice? Does this mean I can GPU accelerate my old code?