System parameters:
Remarks: The Altix UV 2 series is the latest (6th) generation of ccNUMA shared-memory systems made by SGI and the second in the Altix UV series. Apart from the UV 2000 there is also a 4-socket UV 20, but this is too small to be discussed here. The processor used is the Intel Ivy Bridge. The distinguising factor of the UV systems is their distributed shared memory that can be up to 8.2 TB. Every blade can carry up to 128 GB of memory that is shared in a ccNUMA fashion through hubs and the 6th generation of SGI's proprietary NumaLink6. A very high-speed interconnect with a point-to-point bandwidth of 6.7 GB/s per direction; doubled with respect to the former NumaLink5. Like Bull, Cray, and Eurotech SGI offers the possibility to exchange two CPUs on a blade by either nVIDIA's Tesla K20X GPUs or by Xeon Phi accelerators. A UV blade contains 4 12-core Ivi Bridge processors, connected to each other by two QPI links while each processor also connects to the Northbridge chipset for I/O, etc. Lastly all processors are connected via QPI links to the UV hub that takes care of the communication with the rest of the system. The bandwidth from the hub to the processors is 25.6 GB/s while the 4 ports for outside communication are approximately 13.5 GB/s each. The hub does much more than acting as a simple router. It ensures cache coherency in the dirstibuted shared memory. There is an Active Memory Unit that supports atomic memory operations and takes care of thread synchonisation. The Global Register Unit (GRU) within the hub also extends the x86 addressing mode (44-bit physical, 48-virtual) to 53, resp. 60 bits to accomodate the potentially very large global address space of the system. In addition it houses an external TLB cache that enables large memory page support. Furthermore it can perform asynchronous block copy operations akin to the block transfer unit in Cray's Gemini and Aries routers. In addition the GRU is accomodates scatter/gather operations which greatly can speed up cache-unfriendly sparse algorithms. Lastly, MPI operations can be off-loaded from the CPU and barriers and synchonisation for reduction operations are taken care of in the MPI Offload Engine (MOE). The UV systems come with the usual Intel stack of compilers and tools. To take full advantage of the facilities of the hub it is advised to use SGI's MPI version based on its Message Passing Toolkit although independent implementations, like OpenMPI, also will work.
Measured Performances: |