The Eurotech Aurora

Introduction
HPC Architecture
  1. Shared-memory SIMD machines
  2. Distributed-memory SIMD machines
  3. Shared-memory MIMD machines
  4. Distributed-memory MIMD machines
  5. ccNUMA machines
  6. Clusters
  7. Processors
    1. AMD Magny-Cours
    2. IBM POWER6
    3. IBM POWER7
    4. IBM PowerPC 970MP
    5. IBM BlueGene processors
    6. Intel Xeon
    7. The SPARC processors
  8. Accelerators
    1. GPU accelerators
      1. ATI/AMD
      2. nVIDIA
    2. General accelerators
      1. The IBM/Sony/Toshiba Cell processor
      2. ClearSpeed/Petapath
    3. FPGA accelerators
      1. Convey
      2. Kuberre
      3. SRC
  9. Networks
    1. Infiniband
    2. InfiniPath
    3. Myrinet
Available systems
  • The Bull bullx system
  • The Cray XE6
  • The Cray XMT
  • The Cray XT5h
  • The Fujitsu FX1
  • The Hitachi SR16000
  • The IBM BlueGene/L&P
  • The IBM eServer p575
  • The IBM System Cluster 1350
  • The NEC SX-9
  • The SGI Altix UV series
  • Systems disappeared from the list
    Systems under development
    Glossary
    Acknowledgments
    References

    Machine type Distributed-memory multi-processor
    Models Aurora 5500
    Operating system Linux
    Connection structure 3-D Torus + Tree network
    Compilers Fortran 90, OpenMP, C, C++
    Vendors information Web page www.eurotech.com/en/hpc/
    Year of introduction 2009

    System parameters:

    Model Aurora 5500
    Clock cycle 2.93 GHz
    Theor. peak performance  
    Per core (64-bits) 11.7 Gflop/s
    Per 8-chassis rack 24 Tflop/s
    Max. Configuration
    Memory  
    Per node ≤ 48 GB
    Max. Configuration
    No. of processors  
    Per Cabinet 192
    Max. Configuration
    Communication bandwidth  
    Point-to-point (theor.) 2.5 GB/s
    Aggregate per node 60 GB/s

    Remarks:

    The Aurora system has most characteristics of the average cluster but there are a number of distinguishing factors that warrant its description in this report. For one thing, the system is water cooled on a per node basis like the IBM POWER6 systems (see \ref{p575}). This greatly contributes to the power efficiency of the system. Furthermore, the system contains no spinning disks anymore. It is entirely equipped with solid state disks for I/O. Also this contributes to the energy efficiency as no power is used for memory that is not active.

    The interconnect infrastructure is out of the ordinary in comparison with the standard cluster. It has a QDR Infiniband network in common with other clusters but, in addition, it also contains a 3-D torus network. Together Eurotech calls this its Unified Network Architecture with a latency of about 1 µs and a poin-to-point bandwidth of 2.5 GB/s. The network processor is in fact a rather large Altera Stratix FPGA that provides the possibility of reconfiguration of the network and hardware synchonisation of MPI primitives.

    An Aurora node consists of two Intel X5500 (Nehalem EP) processors, each with their associated DDR3 memory. So, 48 GB at a maximum per node. Via the Tylersburg bridge the network processor is connected through PCIe Gen2 to the network processor, containing the Stratix FPGA, that drives the 3-D network and a Mellanox ConnectX Infiniband HCA.

    In principle the FPGA has sufficient capacity also to be used as a computational accelerator but Eurotech has no fixed plans yet to offer it as such. Eurotech does not give a maximum configuration for the Aurora but the brochures suggest that it considers building a Petaflop system (42 racks) is entirely possible.\\ Although the Aurora documentation is not very clear on the software that is available it is evident that Linux is the OS and the usual Intel compiler suite is available. The MPI version is optimised for the architecture but but system-agnostic MPI versions can also be used.

    Measured Performances:

    At the time of writing this report official no performance figures from the Aurora are available.