In the last few years computational accelerators have emerged and have taken a
firm foothold now. They come in various forms of which we will discuss some
general characteristics. Accelerators are not a new phenomenon: in the 1980's,
for instance, Floating Point Systems sold attached processors like the AP120-B
with a peak performance of 12 Mflop/s, easily 10 times faster than the general
purpose systems they were connected to. Also the processor array machines
described in the DM-SIMD section could be regarded as
accelerators for matrix-oriented computations in their time. A similar
phenomenon is on us at the moment. HPC users never tend to be content with the
performance of the machines they have at their disposal and are continuously
looking for ways to speed up their calculations or parts of them. Accelerator
vendors are complying to this wish and presently there is a fair amount of
products that, when properly deployed, can deliver significant performance
(In principle it is entirely possible to perform floating-point computations
with integer functional units, but the costs are so high that no one will
When speaking of special purpose processors, i.c., computational accelerators, one should realise that they are indeed good at some specialized computations while totally unable to perform others. So, not all applications can benefit of them and those which can, not all to the same degree. Futhermore, using accelerators effectively is not at all trivial. Although the Software Development Kits (SDKs) for accelerators have improved enormously lately, for many applications it is still a challenge to obtain a significant speedup. An important factor in this is that data must be shipped in and out the accelerator and the bandwidth of the connecting bus is in most cases a severe bottleneck. One generally tries to overcome this by overlapping data transport to/from the accelerator with processing. Tuning the computation and data transport task can be cumbersome. This hurdle has been recognised by at least two software companies, Acceleware, CAPS, and Rapidmind (now absorbed by Intel). They offer products that automatically transform standard C/C++ programs into a form that integrates the functionality of GPUs, multi-core CPUs (which are often also not used optimally), and, in the case of Rapidmind, of Cell processors.
There is one other and important consideration that makes accelerators popular: in comparison to general purpose CPUs they all are very power-effective. Of course they will do only part of the work in a complete system but still the power savings can be considerable which is very attractive these days.
We will now proceed to discuss the three classes of accelerators mentioned above.