|
Machine type |
RISC-based distributed-memory multi-processor. |
Models |
IBM eServer p775 |
Operating system |
AIX (IBMs Unix variant), Linux (REd Hat EL). |
Connection structure |
Variable (see remarks) |
Compilers |
XL Fortran (Fortran 90), (HPF), XL C, C++ |
Vendors information Web page |
http://www-03.ibm.com/systems/power/hardware/775/index.html |
Year of introduction |
2011 |
System parameters:
Model |
eServer p775 |
Clock cycle |
3.83 GHz |
Theor. peak performance |
|
Per Proc. (8 cores) |
245.1 Gflop/s |
Per node (32 proc.s) |
7.84 Tflop/s |
Per 14-node frame |
8.42 Tflop/s |
Per 12-node rack |
94.1 Tflop/s |
Maximal |
16.05 Pflop/s |
Main memory |
|
Memory/node |
≤ 2 TB |
Memory/maximal |
≤ 4.096 PB |
Communication bandwidth |
|
Node-to-node (see remarks) |
— |
Remarks:
There is a multitude of high end servers in the eServer p-series. However, IBM
singles out the POWER7 based p775 model specifically for HPC. The eServer p775
is the successor of the earlier p575 POWER6-based systems. It retains much of
the macro structure of this system: multi-CPU nodes are connected within a frame
either by a dedicated switch or by other means, like switched Ethernet. The
structure of the nodes, however, has changed considerably, see POWER7. Four octo-core POWER7 processors are housed in a
Quad-Chip Module (QCM) while eight of these constitute a p775 node. So, 256
cores make up a node. The 4 QCMs are all directly connected to each other by
copper wires.
In contrast to its earlier p575 clusters, IBM does now provide a proprietary
interconnect for the system based on in-house optical technology. Each node
contains 224 optical transceivers that each constitute 12 1.25 GB/s send- and
recieve lanes. Ten out of these 12 lanes are used for normal communication where
the two other lanes can act as a fall-back when one of the regular links fails.
The number of links/node is sufficient to directly connect to 127 others and to
achieve the connection to the maximal configuration of 2048 nodes a second level
of interconnection can be realised though hub modules. Depending on the relative
position of the nodes the bandwidth varies: 336 GB/s to 7 other QCMs, 320
GB/s to remote nodes, and 240 GB/s from local to remote nodes. Note that these
are the aggregate bandwidth from all lanes together.
Like the former p575 the p775 system is accessed through a front-end control
workstation that also monitors system failures. Failing nodes can be taken off
line and exchanged without interrupting service. Because of the very dense
packaging of the units that house the POWER7 processors are water cooled.
Applications can be run using PVM or MPI. IBM used to support High Performance
Fortran, both a proprietary version and a compiler from the Portland Group. It
is not clear whether this is still the case. IBM uses its own PVM version from
which the data format converter XDR has been stripped. This results in a lower
overhead at the cost of generality. Also the MPI implementation, MPI-F, is
optimised for the p775-based systems. As the nodes are in effect shared-memory
SMP systems, within the nodes OpenMP can be employed for shared-memory
parallelism and it can be freely mixed with MPI if needed. In addition to its
own AIX OS IBM also supports one Linux distributions: the professional
version of RedHat Linux is available for the p775 series.
Measured Performances:
In [39] a speed of 1.52 Pflop/s was reported
for a 63,360-core system. The efficiency for solving the dense linear system was
77.9%.
|