nVidia announced their Tesla GPU platform today.
- 4 GPUs in a 1U rack
- Recommends 1 CPU per GPU
- 800 watts at max load
That last stat should scare the crap out of anyone who imagines building a full-height rack (42U) out of these. Ask your datacenter guys how excited they'd be to put in a rack that draws 280 amps? This is double what a rack of dual quad core Clovertown Xeons will use at full load.
They also cost $12000 per 1U, according to Gizmodo I think. A rack? A cool half mil (retail). They'll throw in a free nVidia hat, I'm sure.
If you buy these, you better be damn sure you can get orders of magnitude better performance on the GPU with your algorithm. 3x performance for 3x the price won't make it worth any headaches.
I've long been a skeptic of general computing for the GPU, mostly because of the practicality of the hardware itself in a farm. When nVidia first approached a company I was working for in 2003, we politely listened to the GPU computation idea but knew it wasn't workable. We had 1,000 machines in the renderfarm, how could we justify a $1,000,000 expense of putting a Quadro FX GPU in every one of them, much less add the A/C to deal with it?
That was completely aside of the point that our rendering issues at the time were entirely memory bound issues. We usually could not render two main characters in the same pass. (Sadly, this was just before AMD came out with x64. We could have really used that on that project.)
Finally, when 300 people are trying to get their stuff rendered, the perceived latency of the farm is related to the quantity of machines, not the speed of the individual machine. Having a large swath of your frames rendering appears to be better than knowing frames that have been waiting for 3 hours will eventually run on really fast machines. Either way, it's hard to justify an individual machine's speed increase to double (or triple!) the price of the node (unless rackspace and/or power are a problem).
And now getting down to the nitty gritty... where are the benchmarks for Tesla? I found some for the G80, but nVidia has written an entire 3D renderer that has GPU acceleration. You'd think they would just fire up Gelato on one of Tesla machines and blow our minds with the performance, right?
It seems that nVidia is trying to sell these machines to industry (not just research) by appealing to the financial sector. They've put out a bunch of stats that they can compute billions of Black-Scholes prices per second using their CUDA toolkit on G80. An example that uses random number generation is great, but how about shuttling that amount of data from the network, disk, or even RAM? It doesn't even matter if you have a GPU computing the stuff when your bottleneck is the speed you can get option quotes from the CBOT.
There are crazy number crunchers who can use this, but I'm pretty sure that the overwhelming majority of the time, someone will be better off with x86 (or x64), in terms of cost, power and ease of programming. And let's not forget about hardware reliability and ease of replacement. What's the MTBF? Is nVidia prepared to service one of these GPU compute servers in two hours, like Dell, IBM and HP support offer on plain old x86 units?