The Cray Inc. XE6

Introduction

HPC Architecture

Shared-memory SIMD machines

Distributed-memory SIMD machines

Shared-memory MIMD machines

Distributed-memory MIMD machines

ccNUMA machines

Clusters

Processors

AMD Magny-Cours

IBM POWER6

IBM POWER7

IBM PowerPC 970MP

IBM BlueGene processors

Intel Xeon

The SPARC processors

Accelerators

GPU accelerators

ATI/AMD

nVIDIA

General accelerators

The IBM/Sony/Toshiba Cell processor

ClearSpeed/Petapath

FPGA accelerators

Convey

Kuberre

SRC

Networks

Infiniband

InfiniPath

Myrinet

Available systems
The Bull bullx system

The Cray XE6

The Cray XMT

The Cray XT5_h

The Fujitsu FX1

The Hitachi SR16000

The IBM BlueGene/L&P

The IBM eServer p575

The IBM System Cluster 1350

The NEC SX-9

The SGI Altix UV series

Systems disappeared from the list

Systems under development

Glossary

Acknowledgments

References

Machine type Distributed-memory multi-processor
Models XE6
Operating system UNICOS/lc, Cray's microkernel Unix
Connection structure 3-D Torus
Compilers Fortran 95, C, C++, UPC, Co-Array Fortran
Vendors information Web page www.cray.com/Products/XE/CrayXE6System
Year of introduction 2010

System parameters:

Model Cray XE6
Clock cycle 2.3 GHz
Theor. peak performance
Per Processor 12×9.2 Gflop/s
Per Cabinet 21.2 Tflop/s
Max. Configuration —
Memory
Per Cabinet ≤ 6.14 TB
Max. Configuration —
No. of processors
Per Cabinet 192
Max. Configuration —
Communication bandwidth
Point-to-point ≤ 8.3 GB/s
Bisectional/cabinet 2.39 TB/s

Remarks:
The structure of the Cray machines has proved to be very stable over the years: a 3-D torus that connects the processor nodes. The nodes as well as the routers have made it through quite a development, however. From the earliest XT-systems with a single AMD core to the XE6 with two 12-core Magny-Cours processors in the XE6 node. Also the interconnect routers have gone through an evolution from the first SeaStar router to the new Gemini, perhaps the most distinguishing factor of the system. The Genimi is based on the 48-port YARC chip that boasts a 160 GB/s internal aggregate bandwidth. Since the Gemini Network Interface Card (NIC) operates at 650 MHz and the NIC is able to transfer 64 B every 5 cycles, the bandwidth per direction is 8.3 GB/s while the latency varies from 0.7--1.4 µs depending on the type of transfer [1]. In practice bandwidths of over 6 GB/s per direction were measured, compatible with the claim in Cray's brochure of an injection bandwidth of over 20 GB/s/node. A nice feature of the Gemini router is that it supports adaptive routing, even on a packet to packet basis. As the 3-D torus topology is vulnerable with regard to link failures this will make the network much more robust.

Besides the compute nodes there are I/O nodes that can be configured as interactive nodes or nodes that connect to background storage. The I/O nodes only contain one opteron processor but, in contrast to the compute nodes they run a full Linux operating system. The compute nodes run a special Linux variant, called Extreme Scalability Mode, that greatly reduces the variability of the runtimes of similar tasks This ensures very predictable execution times as no interference from system tasks occurs. This so-called OS-jitter can be quite detrimental to overall performance, especially for very large machine configurations. In the IBM BlueGene systems (see the BlueGene systems) a similar separation between compute and service nodes is employed.

Cray offers the usual compilers and AMD's ACML numerical library but also its own scientific library and compilers for the PGAS languages UPC and Co-Array Fortran (CAF). Besides Cray's MPI implementation also its shmem library for one-sided communication is available.
Although not yet available, it is to be expected that an XE6m model will become available, similar to the predecessors XT5 and XT5m where "m" stands for midrange. If this is the case the XE6m will have at most 6 cabinets with a peak speed of just over 125 Tflop/s. For the XE6 model itself no maximum configuration is given. The Cray documentation suggests that more than a million cores would be possible.

Measured Performances:
The Cray XE6 was introduced in May 2010 and as yet no performance results are available.