System parameters:
Remarks: The Altix UV is the latest (5th) generation of ccNUMA shared-memory systems made by SGI. Unlike the earlier two generations the processor used is not from the Intel Itanium line but rather from the Xeon family: the Xeon X7500, or Nehalem EX. We only present the UV 100 and UV 1000 models here as the UV 10 falls below our performance criterion. The UV 100 is in about all respects just a smaller version of the the UV 1000. Only the packaging and the interconnect topology are presumably different but the information about the topology of the interconnect is somewhat confusing. SGI's fact sheet about the UV systems contains the information stated above but a white paper from 2009 gives a detailed picture of a fat tree interconnection on the 8-blade chassis level and for a 2048 core system. Only above 2048 cores (the current UV 1000) a 2-D torus is described for systems up to 262,144 cores. For the moment we assume that the information in the fact sheet is the most probable. A UV blade contains two X7500 processors, connected to each other by two QPI links while each processor also connects to the Northbridge chipset for I/O, etc. Lastly both processors are connected via a QPI link to the UV hub that takes care of the communication with the rest of the system. The bandwidth from the hub to the processors is 25.6 GB/s while the 4 ports for outside communication are approximately 10 GB/s each. The hub does much more than acting as a simple router. It ensures cache coherency in the dirstibuted shared memory. There is an Active Memory Unit that supports atomic memory operations and takes care of thread synchonisation. The Global Register Unit (GRU) within the hub also extends the x86 addressing mode (44-bit physical, 48-virtual) to 53, resp. 60 bits to accomodate the potentially very large global address space of the system. In addition it houses an external TLB cache that enables large memory page support. Furthermore it can perform asynchronous block copy operations akin to the block transfer unit in Cray's Gemini router. In addition the GRU is accomodates scatter/gather operations which greatly can speed up cache-unfriendly sparse algorithms. Lastly, MPI operations can be off-loaded from the CPU and barriers and synchonisation for reduction operations are taken care of in the MPI Offload Engine (MOE). The UV systems come with the usual Intel stack of compilers and tools. To take full advantage of the facilities of the hub it is advised to use SGI's MPI version based on its Message Passing Toolkit although independent implementations, like OpenMPI, also will work.
Measured Performances: |