The Convey systems

Introduction
HPC Architecture
  1. Shared-memory SIMD machines
  2. Distributed-memory SIMD machines
  3. Shared-memory MIMD machines
  4. Distributed-memory MIMD machines
  5. ccNUMA machines
  6. Clusters
  7. Processors
    1. AMD Opteron
    2. IBM POWER7
    3. IBM BlueGene/Q processor
    4. Intel Xeon
    5. The SPARC processors
  8. Accelerators
    1. GPU accelerators
      1. ATI/AMD
      2. nVIDIA
    2. General computational accelerators
      1. Intel Xeon Phi
    3. FPGA accelerators
      1. Convey
      2. Kuberre
      3. SRC
  9. Interconnects
    1. Infiniband
Available systems
  • The Bull bullx system
  • The Cray XC30
  • The Cray XE6
  • The Cray XK7
  • The Eurotech Aurora
  • The Fujitsu FX10
  • The Hitachi SR16000
  • The IBM BlueGene/Q
  • The IBM eServer p775
  • The NEC SX-9
  • The SGI Altix UV series
  • Systems disappeared from the list
    Systems under development
    Glossary
    Acknowledgments
    References

    The HC-2 is an example of the hybrid solutions that has came up to avoid the unwieldy HDL programming of FPGAs while still benefitting from their potential acceleration capabilities. The HC-2 comprises a familiar x86 front-end with a modified Centos Linux distribution under the name of Convey Linux. Furthermore, there is a co-processor part that contains 4 Xilinx V5 or V6 FPGAs that can be configured into a variety of ``personalities'' that accomodate users from different application areas. Personalities offered are, e.g., Oil and Gas industry, Financial Analytic market, and the Life Sciences.

     

    The Convey HC-2

    The Convey HC-2(ex) was announced in May 2012. It is the second generation of this type of machines. The main difference between the present model and the former HC-1 is the use of more recent Intel host processor, see Intel Xeon and/or a larger, newer Xilinx FPGA: a Virtex 5 LX330 in the HC-2 model and the larger Virtex 6 LX760 in the HC-2ex. In Figure 24 we give a diagram of the HC-2 co-processors's structure.

    Block diagram of the Convey HC-2ex

    Figure 24: Block diagram of the Convey HC-2 and HC-2ex.

    A personality that will be often used for scientific an technical work is the vector personality. Thanks to the compilers provided by Convey standard code in Fortran and C/C++ can be automatically vectorised and executed the vector units that have been configured in the 4 FPGAs for a total of 32 function pipes. Each of these contain a vector register file, four pipes that can execute Floating Multiply Add instructions, pipe for Integer, Logical, Divide, and Miscellaneous instructions and a Load/Store pipe. For other selected personalities the compilers will generate code that is optimal for the instruction mix generated for the appropriately configured FPGAs in the Application Engine.
    The Application Engine Hub shown in Figure 24 contains the interface to the x86 host but also the part that maps the instructions onto the application engine. In addition, it will perform some scalar processing that is not readily passed on to the Application Engine.
    Because the system has many different faces, it is hard to speak about the peak performance of the system. By now there is some experience with the HC-2 in various application areas and it appears that the system is doing well in a number of Life Science applications and in data analytics.

     

    The Convey MX-100

    This system has virtually the same structure as the HC-2 model. It is explicitely targeted at High Performance Data Analysis (HPA). This type of workload is characterised by large amounts of parallelism but highly irregular data access patterns and a low computational content. These are unfavourable conditions for standard RISC processors that try to hide memory latency by means of the cache hierarchy. For irregular data access with no discernable data locality, however, this will not work. Like in the HC-2, the XM-100 accesses data on a 64-bit double word basis from its Scatter-Gather memory, and, in addition, every double word contains a full/empty bit that allows for acceleration of graph searching and atomic in-memory operations. The XM-100 has these properties in common with the Cray uRIKA systems. However, as the emphasis in this report is on HPC and not on HPA we will refrain from discussing the system in more detail.