The Cray Inc. XE6

Introduction
HPC Architecture
  1. Shared-memory SIMD machines
  2. Distributed-memory SIMD machines
  3. Shared-memory MIMD machines
  4. Distributed-memory MIMD machines
  5. ccNUMA machines
  6. Clusters
  7. Processors
    1. AMD Magny-Cours
    2. IBM POWER6
    3. IBM POWER7
    4. IBM PowerPC 970MP
    5. IBM BlueGene processors
    6. Intel Xeon
    7. The SPARC processors
  8. Accelerators
    1. GPU accelerators
      1. ATI/AMD
      2. nVIDIA
    2. General accelerators
      1. The IBM/Sony/Toshiba Cell processor
      2. ClearSpeed/Petapath
    3. FPGA accelerators
      1. Convey
      2. Kuberre
      3. SRC
  9. Networks
    1. Infiniband
    2. InfiniPath
    3. Myrinet
Available systems
  • The Bull bullx system
  • The Cray XE6
  • The Cray XMT
  • The Cray XT5h
  • The Fujitsu FX1
  • The Hitachi SR16000
  • The IBM BlueGene/L&P
  • The IBM eServer p575
  • The IBM System Cluster 1350
  • The NEC SX-9
  • The SGI Altix UV series
  • Systems disappeared from the list
    Systems under development
    Glossary
    Acknowledgments
    References

    Machine type Distributed-memory multi-processor
    Models XE6
    Operating system UNICOS/lc, Cray's microkernel Unix
    Connection structure 3-D Torus
    Compilers Fortran 95, C, C++, UPC, Co-Array Fortran
    Vendors information Web page www.cray.com/Products/XE/CrayXE6System
    Year of introduction 2010

    System parameters:

    Model Cray XE6
    Clock cycle 2.3 GHz
    Theor. peak performance  
    Per Processor 12×9.2 Gflop/s
    Per Cabinet 21.2 Tflop/s
    Max. Configuration
    Memory  
    Per Cabinet ≤ 6.14 TB
    Max. Configuration
    No. of processors  
    Per Cabinet 192
    Max. Configuration
    Communication bandwidth  
    Point-to-point ≤ 8.3 GB/s
    Bisectional/cabinet 2.39 TB/s

    Remarks:

    The structure of the Cray machines has proved to be very stable over the years: a 3-D torus that connects the processor nodes. The nodes as well as the routers have made it through quite a development, however. From the earliest XT-systems with a single AMD core to the XE6 with two 12-core Magny-Cours processors in the XE6 node. Also the interconnect routers have gone through an evolution from the first SeaStar router to the new Gemini, perhaps the most distinguishing factor of the system. The Genimi is based on the 48-port YARC chip that boasts a 160 GB/s internal aggregate bandwidth. Since the Gemini Network Interface Card (NIC) operates at 650 MHz and the NIC is able to transfer 64 B every 5 cycles, the bandwidth per direction is 8.3 GB/s while the latency varies from 0.7--1.4 µs depending on the type of transfer [1]. In practice bandwidths of over 6 GB/s per direction were measured, compatible with the claim in Cray's brochure of an injection bandwidth of over 20 GB/s/node. A nice feature of the Gemini router is that it supports adaptive routing, even on a packet to packet basis. As the 3-D torus topology is vulnerable with regard to link failures this will make the network much more robust.

    Besides the compute nodes there are I/O nodes that can be configured as interactive nodes or nodes that connect to background storage. The I/O nodes only contain one opteron processor but, in contrast to the compute nodes they run a full Linux operating system. The compute nodes run a special Linux variant, called Extreme Scalability Mode, that greatly reduces the variability of the runtimes of similar tasks This ensures very predictable execution times as no interference from system tasks occurs. This so-called OS-jitter can be quite detrimental to overall performance, especially for very large machine configurations. In the IBM BlueGene systems (see the BlueGene systems) a similar separation between compute and service nodes is employed.

    Cray offers the usual compilers and AMD's ACML numerical library but also its own scientific library and compilers for the PGAS languages UPC and Co-Array Fortran (CAF). Besides Cray's MPI implementation also its shmem library for one-sided communication is available.

    Although not yet available, it is to be expected that an XE6m model will become available, similar to the predecessors XT5 and XT5m where "m" stands for midrange. If this is the case the XE6m will have at most 6 cabinets with a peak speed of just over 125 Tflop/s. For the XE6 model itself no maximum configuration is given. The Cray documentation suggests that more than a million cores would be possible.

    Measured Performances:

    The Cray XE6 was introduced in May 2010 and as yet no performance results are available.