FPGA-based Accelerators

Introduction
HPC Architecture
  1. Shared-memory SIMD machines
  2. Distributed-memory SIMD machines
  3. Shared-memory MIMD machines
  4. Distributed-memory MIMD machines
  5. ccNUMA machines
  6. Clusters
  7. Processors
    1. AMD Opteron
    2. IBM POWER7
    3. IBM BlueGene/Q processor
    4. Intel Xeon
    5. The SPARC processors
  8. Accelerators
    1. GPU accelerators
      1. ATI/AMD
      2. nVIDIA
    2. General computational accelerators
      1. Intel Xeon Phi
    3. FPGA accelerators
      1. Convey
      2. Kuberre
      3. SRC
  9. Interconnects
    1. Infiniband
Available systems
  • The Bull bullx system
  • The Cray XC30
  • The Cray XE6
  • The Cray XK7
  • The Eurotech Aurora
  • The Fujitsu FX10
  • The Hitachi SR16000
  • The IBM BlueGene/Q
  • The IBM eServer p775
  • The NEC SX-9
  • The SGI Altix UV series
  • Systems disappeared from the list
    Systems under development
    Glossary
    Acknowledgments
    References

    An FPGA (Field Programmable Gate Array) is an array of logic gates that can be hardware-programmed to fulfill user-specified tasks. In this way one can devise special purpose functional units that may be very efficient for this limited task. Moreover, it is possible to configure a multiple of these units on an FPGA that work in parallel. So, potentially, FPGAs may be good candidates for the acceleration of certain applications. Because of their versatility it is difficult to specify where they will be most useful. In general, though, they are not used for heavy 64-bit precision floating-point arithmetic. Excellent results have been reported in searching, pattern matching, signal- and image-processing, encryption, etc. The clock cycle of FPGAs is low as compared to that of present CPUs: 100--550 MHz which means that they are very power effective. All vendors provide runtime environments and drivers that work with Linux as well as Windows.
    Tradionally, FPGAs are configured by describing the configuration by means of a hardware description language (HDL), like VHDL or Verilog. This is very cumbersome for the average programmer as one not only has to explicitely define such details as the placement of the configured devices but also the width of the operands to be be operated on, etc. This problem has been recognised by FPGA-based vendors and a large variety of programming tools and SDKs have come into existence. Unfortunately, they differ enormously in approach and the resulting programs are far from compatible. Also for FPGA-based accelerators, like for GPUs, there is an initiative to develop a unified API that will assure compatibility between platforms. The non-profit OpenFPGA consortium is heading this effort. Various working groups concentrate on, for instance, a core library, an application library, and an API definition. There is no unified way to program FPGAs platform independently, however, and it may take a long time to get there. However, the scene may change in the next few years as Altera, one of the major FPGA manufacturers, recently announced to provide an OpenCL SDK. This should make it much easier for C programmers to develop programs that take advantage of the FPGA capabilities. For instance the Scottish FPGA integrator Nallatech is offering various Altera Stratix V configurtions to be used with OpenCL.

    The two big players on the FPGA market are Altera and Xilinx. However, in the accelerator business one seldom will find these names mentioned, because the FPGAs they produce are packaged in a form that makes them usable for accelerator purposes.

    It is not possible to fully discuss all vendors that offer FPGA-based products. One reason is that there is a very large variety of products ranging form complete systems to small appliances housing one FPGA and the appropriate I/O logic to communicate with the outside world. To complicate matters further, the FPGAs themselves come in many variants, e.g., with I/O channels, memory blocks, multipliers, or DSPs already configured (or even fixed) and one can choose for FPGAs that have for instance a PowerPC405 embedded. Therefore we present the FPGA accelerators here only in the most global way and necessarily incomplete.

    We discuss three example systems in the Convey-, Kuberre-, and SRC-systems, respectively. For these systems the vendors that have gone to great length not to expose their users to the use of HDLs, although for the highest benefits this not always can be avoided.Necessarily, we are here again somewhat arbitrary because this area is changing extremely rapidly.