System parameters:
Remarks: Until the introduction of the XC30 the structure of the Cray machines was very stable over the years: a 3-D torus that connects the processor nodes. The XE6 is the last in this line. The nodes as well as the routers have made it through quite a development, however. From the earliest XT-systems with a single AMD core to the XE6 with two 16-core Interlagos processors in the XE6 node. Also the interconnect routers have gone through an evolution from the first SeaStar router to the new Gemini, perhaps the most distinguishing factor of the system. The Genimi is based on the 48-port YARC chip that boasts a 160 GB/s internal aggregate bandwidth. Since the Gemini Network Interface Card (NIC) operates at 650 MHz and the NIC is able to transfer 64 B every 5 cycles, the bandwidth per direction is 8.3 GB/s while the latency varies from 0.7--1.4 µs depending on the type of transfer [1]. In practice bandwidths of over 6 GB/s per direction were measured, compatible with the claim in Cray's brochure of an injection bandwidth of over 20 GB/s/node. A nice feature of the Gemini router is that it supports adaptive routing, even on a packet to packet basis. As the 3-D torus topology is vulnerable with regard to link failures this will make the network much more robust. Besides the compute nodes there are I/O nodes that can be configured as interactive nodes or nodes that connect to background storage. The I/O nodes only contain one opteron processor but, in contrast to the compute nodes they run a full Linux operating system. The compute nodes run a special Linux variant, called Extreme Scalability Mode, that greatly reduces the variability of the runtimes of similar tasks This ensures very predictable execution times as no interference from system tasks occurs. This so-called OS-jitter can be quite detrimental to overall performance, especially for very large machine configurations. In the IBM BlueGene systems (see the BlueGene systems) a similar separation between compute and service nodes is employed. Cray offers the usual compilers and AMD's ACML numerical library but also its own scientific library and compilers for the PGAS languages UPC and Co-Array Fortran (CAF). Besides Cray's MPI implementation also its shmem library for one-sided communication is available. In 2011 the XE6m model has become available, where "m" stands for midrange. The XE6m has at most 6 cabinets with a peak speed of just over 120 Tflop/s. A further rationalisation is that not a 3-D but a 2-D torus is employed as the interconnection network. For the XE6 model itself no maximum configuration is given. The Cray documentation suggests that more than a million cores would be possible. Measured Performances: In [39] a speed of 1.11 Pflop/s was reported for a 142272-core XE6, based on 2.4 GHz Instanbul processors for the solution of a linear system of unspecified size. The efficiency was 81.3%. |