The IBM eServer p775

Introduction
HPC Architecture
  1. Shared-memory SIMD machines
  2. Distributed-memory SIMD machines
  3. Shared-memory MIMD machines
  4. Distributed-memory MIMD machines
  5. ccNUMA machines
  6. Clusters
  7. Processors
    1. AMD Opteron
    2. IBM POWER7
    3. IBM BlueGene/Q processor
    4. Intel Xeon
    5. The SPARC processors
  8. Accelerators
    1. GPU accelerators
      1. ATI/AMD
      2. nVIDIA
    2. General computational accelerators
      1. Intel Xeon Phi
    3. FPGA accelerators
      1. Convey
      2. Kuberre
      3. SRC
  9. Interconnects
    1. Infiniband
Available systems
  • The Bull bullx system
  • The Cray XC30
  • The Cray XE6
  • The Cray XK7
  • The Eurotech Aurora
  • The Fujitsu FX10
  • The Hitachi SR16000
  • The IBM BlueGene/Q
  • The IBM eServer p775
  • The NEC SX-9
  • The SGI Altix UV series
  • Systems disappeared from the list
    Systems under development
    Glossary
    Acknowledgments
    References

    Machine type RISC-based distributed-memory multi-processor.
    Models IBM eServer p775
    Operating system AIX (IBMs Unix variant), Linux (REd Hat EL).
    Connection structure Variable (see remarks)
    Compilers XL Fortran (Fortran 90), (HPF), XL C, C++
    Vendors information Web page http://www-03.ibm.com/systems/power/hardware/775/index.html
    Year of introduction 2011

    System parameters:

    Model eServer p775
    Clock cycle 3.83 GHz
    Theor. peak performance  
    Per Proc. (8 cores) 245.1 Gflop/s
    Per node (32 proc.s) 7.84 Tflop/s
    Per 14-node frame 8.42 Tflop/s
    Per 12-node rack 94.1 Tflop/s
    Maximal 16.05 Pflop/s
    Main memory  
    Memory/node ≤ 2 TB
    Memory/maximal ≤ 4.096 PB
    Communication bandwidth  
    Node-to-node (see remarks)

    Remarks:

    There is a multitude of high end servers in the eServer p-series. However, IBM singles out the POWER7 based p775 model specifically for HPC. The eServer p775 is the successor of the earlier p575 POWER6-based systems. It retains much of the macro structure of this system: multi-CPU nodes are connected within a frame either by a dedicated switch or by other means, like switched Ethernet. The structure of the nodes, however, has changed considerably, see POWER7. Four octo-core POWER7 processors are housed in a Quad-Chip Module (QCM) while eight of these constitute a p775 node. So, 256 cores make up a node. The 4 QCMs are all directly connected to each other by copper wires.
    In contrast to its earlier p575 clusters, IBM does now provide a proprietary interconnect for the system based on in-house optical technology. Each node contains 224 optical transceivers that each constitute 12 1.25 GB/s send- and recieve lanes. Ten out of these 12 lanes are used for normal communication where the two other lanes can act as a fall-back when one of the regular links fails. The number of links/node is sufficient to directly connect to 127 others and to achieve the connection to the maximal configuration of 2048 nodes a second level of interconnection can be realised though hub modules. Depending on the relative position of the nodes the bandwidth varies: 336 GB/s to 7 other QCMs, 320 GB/s to remote nodes, and 240 GB/s from local to remote nodes. Note that these are the aggregate bandwidth from all lanes together.
    Like the former p575 the p775 system is accessed through a front-end control workstation that also monitors system failures. Failing nodes can be taken off line and exchanged without interrupting service. Because of the very dense packaging of the units that house the POWER7 processors are water cooled.
    Applications can be run using PVM or MPI. IBM used to support High Performance Fortran, both a proprietary version and a compiler from the Portland Group. It is not clear whether this is still the case. IBM uses its own PVM version from which the data format converter XDR has been stripped. This results in a lower overhead at the cost of generality. Also the MPI implementation, MPI-F, is optimised for the p775-based systems. As the nodes are in effect shared-memory SMP systems, within the nodes OpenMP can be employed for shared-memory parallelism and it can be freely mixed with MPI if needed. In addition to its own AIX OS IBM also supports one Linux distributions: the professional version of RedHat Linux is available for the p775 series.

    Measured Performances:
    In [39] a speed of 1.52 Pflop/s was reported for a 63,360-core system. The efficiency for solving the dense linear system was 77.9%.