System parameters:
Remarks: The XK7 machine has the structure of the Cray XE6 (see above) but in a node two of the Opteron processors have been replaced by NVIDIA GPUs. For appropriate applications this will boost the performance more than 5-fold. Because the application speed is so dependent on the application and the the amount of data to be shipped back and forth between the GPU's memory and the system memory no sensible speed estimate can begiven, except that for a cabinet the performance may well exceed 100 Tflop/s when the application is right. Apart from the usual software stack for Cray products of coarse CUDA and OpenCL are supported for the GPUs and also OpenACC, the OpenMP-like directive/pragma-based library and runtime that should make it easier for the general programmer to take advantage of the GPUs. Measured Performances: In [39] a speed of 17.59 Pflop/s was reported on the 560640-core XK7 Titan machine of ONRL, for the solution of a linear system of unspecified size. The efficiency was 64.9%; surprisingly high for a GPU-based system. |