System parameters:
Remarks: The structure of the Cray machines has proved to be very stable over the years: a 3-D torus that connects the processor nodes. The nodes as well as the routers have made it through quite a development, however. From the earliest XT-systems with a single AMD core to the XE6 with two 12-core Magny-Cours processors in the XE6 node. Also the interconnect routers have gone through an evolution from the first SeaStar router to the new Gemini, perhaps the most distinguishing factor of the system. The Genimi is based on the 48-port YARC chip that boasts a 160 GB/s internal aggregate bandwidth. Since the Gemini Network Interface Card (NIC) operates at 650 MHz and the NIC is able to transfer 64 B every 5 cycles, the bandwidth per direction is 8.3 GB/s while the latency varies from 0.7--1.4 µs depending on the type of transfer [1]. In practice bandwidths of over 6 GB/s per direction were measured, compatible with the claim in Cray's brochure of an injection bandwidth of over 20 GB/s/node. A nice feature of the Gemini router is that it supports adaptive routing, even on a packet to packet basis. As the 3-D torus topology is vulnerable with regard to link failures this will make the network much more robust. Besides the compute nodes there are I/O nodes that can be configured as interactive nodes or nodes that connect to background storage. The I/O nodes only contain one opteron processor but, in contrast to the compute nodes they run a full Linux operating system. The compute nodes run a special Linux variant, called Extreme Scalability Mode, that greatly reduces the variability of the runtimes of similar tasks This ensures very predictable execution times as no interference from system tasks occurs. This so-called OS-jitter can be quite detrimental to overall performance, especially for very large machine configurations. In the IBM BlueGene systems (see the BlueGene systems) a similar separation between compute and service nodes is employed. Cray offers the usual compilers and AMD's ACML numerical library but also its own scientific library and compilers for the PGAS languages UPC and Co-Array Fortran (CAF). Besides Cray's MPI implementation also its shmem library for one-sided communication is available. Although not yet available, it is to be expected that an XE6m model will become available, similar to the predecessors XT5 and XT5m where "m" stands for midrange. If this is the case the XE6m will have at most 6 cabinets with a peak speed of just over 125 Tflop/s. For the XE6 model itself no maximum configuration is given. The Cray documentation suggests that more than a million cores would be possible. Measured Performances: The Cray XE6 was introduced in May 2010 and as yet no performance results are available. |