The HC-2 is an example of the hybrid solutions that has came up to avoid the unwieldy HDL programming of FPGAs while still benefitting from their potential acceleration capabilities. The HC-2 comprises a familiar x86 front-end with a modified Centos Linux distribution under the name of Convey Linux. Furthermore, there is a co-processor part that contains 4 Xilinx V5 or V6 FPGAs that can be configured into a variety of ``personalities'' that accomodate users from different application areas. Personalities offered are, e.g., Oil and Gas industry, Financial Analytic market, and the Life Sciences.
The Convey HC-2The Convey HC-2(ex) was announced in May 2012. It is the second generation of this type of machines. The main difference between the present model and the former HC-1 is the use of more recent Intel host processor, see Intel Xeon and/or a larger, newer Xilinx FPGA: a Virtex 5 LX330 in the HC-2 model and the larger Virtex 6 LX760 in the HC-2ex. In Figure 24 we give a diagram of the HC-2 co-processors's structure.![]() Figure 24: Block diagram of the Convey HC-2 and HC-2ex.
A personality that will be often used for scientific an technical work is the
vector personality. Thanks to the compilers provided by Convey standard code in
Fortran and C/C++ can be automatically vectorised and executed the vector units
that have been configured in the 4 FPGAs for a total of 32 function pipes.
Each of these contain a vector register file, four pipes that can execute
Floating Multiply Add instructions, pipe for Integer, Logical, Divide, and
Miscellaneous instructions and a Load/Store pipe. For other selected
personalities the compilers will generate code that is optimal for the
instruction mix generated for the appropriately configured FPGAs in the
Application Engine.
The Convey MX-100This system has virtually the same structure as the HC-2 model. It is explicitely targeted at High Performance Data Analysis (HPA). This type of workload is characterised by large amounts of parallelism but highly irregular data access patterns and a low computational content. These are unfavourable conditions for standard RISC processors that try to hide memory latency by means of the cache hierarchy. For irregular data access with no discernable data locality, however, this will not work. Like in the HC-2, the XM-100 accesses data on a 64-bit double word basis from its Scatter-Gather memory, and, in addition, every double word contains a full/empty bit that allows for acceleration of graph searching and atomic in-memory operations. The XM-100 has these properties in common with the Cray uRIKA systems. However, as the emphasis in this report is on HPC and not on HPA we will refrain from discussing the system in more detail. |