The Eurotech Aurora

Introduction
HPC Architecture
  1. Shared-memory SIMD machines
  2. Distributed-memory SIMD machines
  3. Shared-memory MIMD machines
  4. Distributed-memory MIMD machines
  5. ccNUMA machines
  6. Clusters
  7. Processors
    1. AMD Opteron
    2. IBM POWER7
    3. IBM BlueGene/Q processor
    4. Intel Xeon
    5. The SPARC processors
  8. Accelerators
    1. GPU accelerators
      1. ATI/AMD
      2. nVIDIA
    2. General computational accelerators
      1. Intel Xeon Phi
    3. FPGA accelerators
      1. Convey
      2. Kuberre
      3. SRC
  9. Interconnects
    1. Infiniband
Available systems
  • The Bull bullx system
  • The Cray XC30
  • The Cray XE6
  • The Cray XK7
  • The Eurotech Aurora
  • The Fujitsu FX10
  • The Hitachi SR16000
  • The IBM BlueGene/Q
  • The IBM eServer p775
  • The NEC SX-9
  • The SGI Altix UV series
  • Systems disappeared from the list
    Systems under development
    Glossary
    Acknowledgments
    References

    Machine type Distributed-memory multi-processor
    Models AuroraHPC 10-10
    Operating system Linux
    Connection structure 3-D Torus + Tree network
    Compilers Fortran 90, OpenMP, C, C++
    Vendors information Web page www.eurotech.com/en/hpc/hpc+solutions/aurora+hpc+systems
    Year of introduction 2012

    System parameters:

    Model AuroraHPC 10-10
    Clock cycle ≈ 3.1 GHz
    Theor. peak performance  
    Per core (64-bits) ≈ 24.8 Gflop/s
    Per 8-chassis rack ≈ 100 Tflop/s
    Max. Configuration
    Memory  
    Per node ≤ 32 GB
    Max. Configuration
    No. of processors  
    Communication bandwidth  
    Point-to-point (theor.) 2.5 GB/s
    Aggregate per node 60 GB/s

    Remarks:

    We only discuss the latest model, the AuroraHPC 10-10 here as the earlier model has the same macro-architecture but less powerful Nehalem EP instead of Ivy Bridge processors.

    The Aurora system has most characteristics of the average cluster but there are a number of distinguishing factors that warrant its description in this report. For instance, one can choose for SSD storage instead of spinning disks. anymore. The liquid cooling on a per node basis also contributes to the energy efficiency as no power is used for memory that is not active.

    The interconnect infrastructure is out of the ordinary in comparison with the standard cluster. It has a QDR Infiniband network in common with other clusters but, in addition, it also contains a 3-D torus network. Together Eurotech calls this its Unified Network Architecture with a latency of about 1 µs and a point-to-point bandwidth of 2.5 GB/s. The network processor is in fact a rather large Altera Stratix IV FPGA that provides the possibility of reconfiguration of the network and hardware synchonisation of MPI primitives.

    An Aurora node consists of two 12-core Intel Ivy Bridge processors, each with their associated DDR3 memory. So, 32 GB at a maximum per node. Via the Tylersburg bridge the network processor is connected through PCIe Gen2 to the network processor, containing the Stratix FPGA, that drives the 3-D network and a Mellanox ConnectX Infiniband HCA.

    In principle the FPGA has sufficient capacity also to be used as a computational accelerator but Eurotech has no fixed plans yet to offer it as such. Eurotech does not give a maximum configuration for the Aurora but the brochures suggest that it considers building a Petaflop system (10 racks) is entirely possible.\\ Although the Aurora documentation is not very clear on the software that is available it is evident that Linux is the OS and the usual Intel compiler suite is available. The MPI version is optimised for the architecture but system-agnostic MPI versions can also be used.

    Measured Performances:

    At the time of writing this report official no performance figures from the AuroraHPC 10-10 are available.

     

    The Eurotech Aurora Tigon

     

    Machine type Distributed-memory multi-processor
    Models Aurora Tigon
    Operating system Linux
    Connection structure 3-D Torus + Tree network
    Compilers Fortran 90, OpenMP, C, C++, CUDA, OpenCL
    Vendors information Web page www.eurotech.com/en/hpc/hpc+solutions/aurora+hpc+systems/Aurora+Tigon
    Year of introduction 2012
    System parameters:

    Model Aurora Tigon
    Clock cycle ≈ 3.1 GHz
    Theor. peak performance  
    Per core (64-bits) ≈ 24.8 Gflop/s
    Per 8-chassis rack ≤ 350 Tflop/s
    Max. Configuration
    Memory  
    Per node ≤ 32 GB
    Max. Configuration
    No. of processors  
    Communication bandwidth  
    Point-to-point (theor.) 2.5 GB/s
    Aggregate per node 60 GB/s

    Remarks:

    Like in the Bull systems Eurotech markets an accelerator-enhanced sytem called the Tigon. In the Tigon 2 of the standard CPUs in a node can be replaced by either NVIDIA Kepler K20Xs or by Intel Xeon Phis (or any mix thereof). This should lead to a peak performance that is about 3.5 higher than of a CPU-only rack: ≈ 350 Tflop/s at a power consumption of about 100 kW/rack.

    Measured Performances:

    At the time of writing this report official no performance figures from the Aurora Tigon are available.