Kuberre

Introduction

HPC Architecture

Shared-memory SIMD machines

Distributed-memory SIMD machines

Shared-memory MIMD machines

Distributed-memory MIMD machines

ccNUMA machines

Clusters

Processors

AMD Magny-Cours

IBM POWER6

IBM POWER7

IBM PowerPC 970MP

IBM BlueGene processors

Intel Xeon

The SPARC processors

Accelerators

GPU accelerators

ATI/AMD

nVIDIA

General accelerators

The IBM/Sony/Toshiba Cell processor

ClearSpeed/Petapath

FPGA accelerators

Convey

Kuberre

SRC

Networks

Infiniband

InfiniPath

Myrinet

Available systems
The Bull bullx system

The Cray XE6

The Cray XMT

The Cray XT5_h

The Fujitsu FX1

The Hitachi SR16000

The IBM BlueGene/L&P

The IBM eServer p575

The IBM System Cluster 1350

The NEC SX-9

The SGI Altix UV series

Systems disappeared from the list

Systems under development

Glossary

Acknowledgments

References

Since May 2009 Kuberre markets its FPGA-based HANSA system. The information provided is extremely scant. The company has traditionally been involved in financial computing and with the rising need for HPC in this sector Kuberre has built a system that houses 1--16 boards, each with 4 Altera Stratix II FPGAs and 16 GB of memory in addition to one dual core x86-based board that acts as a front-end. The host board runs the Linux or Windows OS and the compilers.
For programming a C/C++ or Java API is available. Although Kuberre naturally is highly oriented to the financial analytic market, the little material that is accessible shows that libraries like, ScaLAPACK, Monte-Carlo algorithms, FFTs and Wavelet transforms are available. For the Life Sciences standard applications like BLAST, and Smith-Watermann are present. The standard GNU C libraries can also be linked seamlessly.
The processors are organised in a grid fashion and use a 256 GB distributed shared cache to combat data access latency. The system comes configured as having 768 RISC CPUs for what are called "generic C/C++ programs" or as 1536 double precision cores for heavy numerical work. It is possible to split the system to run up to 16 different "contexts" (reminiscent to Convey's personalities, see The Convey HC-1). A part of the machine may be dedicated to a Life Science application where other parts work on encryption and numerical applications.
Like for the Convey HC-1 it is hardly possible to give performance figures but a fully configured machine with 16 boards should be able to obtain 250 Gflop/s on the Linpack benchmark.
The material publicly available does not allow to show a reliable block diagram but this may come about later when the system might be installed at sites that want to evaluate it.