The GPU code seems to be relatively isolated to memory operations in gpuops.cu. Not altogether sure but a quick review suggests such mapping is supported by OpenCL. So one could rewrite that module and the whole thing would work without CUDA. Of course, in terms of compilers it is going to be a while before users can move away from nvcc for nvidia graphics card support.
However, as with the parent I'd really like to see a generic OpenCL vectorization kernel module. Strong suspicion that this work was directly or indirectly underwritten by Nvidia so I guess someone (Intel?) needs to step up and fund similar academic projects.
"KGPU is a project of the Flux Research Group at the University of Utah. It is supported by NVIDIA through a graduate fellowship awarded to Weibin Sun."
It really makes you wonder why AMD does not do the same thing; the hw is essentially free and how much could a fellowship cost. Does anyone know if research grants like this are a tax write-off?
You do NOT want to do this. Unlike the CPU which can be preempted current GPUs can not. If you give them 2-30 minutes of work to do they will not return until that work is done. Windows gets around this by resetting the GPU if it doesn't respond for more than a few seconds. Linux/osx no such luck, at least not yet.
But, once reset the state of the GPU is often unknown. Not a good thing if you are embedding GPU code into your kernel.
Right now, the bandwidth to modern GPUs is actually pretty decent (16GB/s bidirectional), but the latency is still horrid. This means that you need rather large operations for offloading to pay off. I think doing raid-5 or full disk encryption with large blocks might just barely be worth it.
However, with AMD and Intel integrated GPUs, this is about to change. AMD is doing a lot of work on HSA, which can be summarized as "GPU and CPU share same memory, and can communicate by passing pointers". I can see this kind of work being really useful in the near future.
More and more CPUs have AES instructions and my old lenovo ideapad has a crypto coprocessor. Do you think the GPU offload will be worth it when the sytem has hw accelerated crypto?
The question should be whether dedicated hardware for accelerated crypto will be worth it when GPU offload is suitable for it.
Although in that particular context (security) you might enjoy the isolation of dedicated hardware as opposed to sharing it with others. The GPU solution though of course has the advantage of being able to adapt to new ciphers etc.
Specialized, single-purpose hardware is right now some 10x more energy-efficient for the same task than a GPU, (and some 50x more efficient than a CPU). Given that modern chips are not limited by transistor density but by energy density, we're going to see more special-purpose hardware in our chips, not less.
Adapting/implementing the newest and hottest cipher on the block is not something that the crypto community advocates. Do you really think crypto accelerated hardware is going to fall behind and not support the ciphers that the crypto community (academia/industry) endorses?
I've been waiting for what, a decade, since VIA introduced padlock to get hardware accelerated encryption in mainstream CPUs. And, just recently, basic support have been introduced but only for AES and nothing else (and if I'm not mistaken (probably is) the padlock is vastly superior to the offerings of AMD and intel :P).
So yes, crypto accelerated hardware is behind and does not support the ciphers that the crypto community (academia/industry) endorses and in all likelihood will never bother catch up since doing it on the GPU will be good enough. Even if it takes another decade.
The AES-NI instruction set was proposed in 2008 and the first intel cpus started shipping almost three years ago.[1] Soekris has had the vpnXXXX crypto accelerators since as long as I can remember.[2]
Blowfish or twofish wouldn't hurt. But I'd be happy with AES, too bad none of my devices have hardware acceleration for it.
The fact that it was proposed in 2008 is quite telling by itself. And when intel introduced it it was in their high-end product lines, to find a processor where AES-NI is less needed would be a challenge.
A suitable integrated GPU would penetrate the market much better and ultimately support products which today use the atom processor. The very same product segment where you barely can use encryption on today (in contrast to a i7 which saturates a fast ssd without breaking a sweat in AES-encryption throughput - without hardware acceleration).
A similar solution would also most likely allow me to encrypt files on my phone without a large impact, even if the manufacturer couldn't care less about security features.
I am not trying to be difficult but i have no idea what you are talking about. I looked through your past comments and you seem to be a competent commenter, can you clarify that you meant?
I had thought in the past that storing the index of a database (not the data, just index) on the card and using that to handle complex queries, might be interesting. Not sure if that has a practical, real-world use though.
Not so much RAID5, which is just an XOR operation that is as good as free on a modern CPU, but RAID6 where a more computation-intensive Reed-Solomon code is used.
One situation where you might see immediate results is in High volume routing. I think this project [1]. They were using the GPU to saturate multiple 10GbE interfaces with a commodity processor.
I actually spoke to them about doing a similar thing, and they said they were using a GPU because they were pushing lots of sub-1500 MTU packets, and that commodity h/w could probably saturate multiple 10GbE interfaces that were just doing large packets for high-throughput.
Maybe it doesn't have to saturate the GPU to be a win. If you can just banish some cache busting, streaming work, like raid processing, to a tiny sliver of the GPU it could be a win.