IBM BlueGene Processors

Introduction
HPC Architecture
  1. Shared-memory SIMD machines
  2. Distributed-memory SIMD machines
  3. Shared-memory MIMD machines
  4. Distributed-memory MIMD machines
  5. ccNUMA machines
  6. Clusters
  7. Processors
    1. AMD Opteron
    2. IBM POWER7
    3. IBM BlueGene/Q processor
    4. Intel Xeon
    5. The SPARC processors
  8. Accelerators
    1. GPU accelerators
      1. ATI/AMD
      2. nVIDIA
    2. General computational accelerators
      1. Intel Xeon Phi
    3. FPGA accelerators
      1. Convey
      2. Kuberre
      3. SRC
  9. Interconnects
    1. Infiniband
Available systems
  • The Bull bullx system
  • The Cray XC30
  • The Cray XE6
  • The Cray XK7
  • The Eurotech Aurora
  • The Fujitsu FX10
  • The Hitachi SR16000
  • The IBM BlueGene/Q
  • The IBM eServer p775
  • The NEC SX-9
  • The SGI Altix UV series
  • Systems disappeared from the list
    Systems under development
    Glossary
    Acknowledgments
    References

    The BlueGene/Q processor

    The BlueGene/Q processor, like its predecessors /L nd /P, uses a variant of the PowerPC family. This time of the A2 processor that was introduced last year by IBM as a network processor. The technology used is 45 nm SOI.

    Unlike the two earlier BlueGene processors this one is a full 64-bit processor. Where the BlueGene/P has 4 core/processor the BlueGene/Q processor contains no less than 18 cores. Sixteen of these are used for computation, one for OS tasks, and core 18 acts as a spare that in principle can be activated when one of the other cores fails, thus adding to the resiliency of the chip (although IBM made no promises that it actually will be a hot spare). In Figure 14 a scheme of the chip layout is given. In contrast to the earlier BlueGene models there is now only one type of network: a 5-D torus. The connections to the outside world are too complicated to depict in this diagram and further packaging details are deferred to the discussion of the system in the section on the IBM BlueGene system.

    Block diagram of an IBM BlueGene/Q processor chip

    Figure 14: Block diagram of an IBM BlueGene/Q processor chip.

     

    The crossbar operates at 800 MHz and has a respectable bisection bandwidth of 563 GB/s, although it is not sufficient to feed all cores simultaneously.

    Block diagram of an IBM BlueGene/Q processor core

    Figure 15: Block diagram of an IBM BlueGene/Q processor core.

     

    The processor cores operate at a clock speed of 1.6 GHz, almost double that of the former BlueGene/P. As can be seen from the diagram in Figure 15 4 FMA instructions can be executed per core, thus delivering a 12.8 Gflop/s peak speed per core. Per processor 16 GB of DDR3-1333 MHz memory is available, two times more per core than for the BlueGene/P. The data are fetched/stored through two memory controllers, each servicing 8 cores at an aggregate bandwidth of 47.2 GB/s. The core is capable of 4-way instruction issue, both for integer and load/store processing and for the floaing-point units. Instruction processing is however in-order, unlike in the former BlueGenes. For the floating-point units this potential bottleneck is mitigated by u Permute Unit that enables reordering strided and otherwise permuted operands to be processed in an SIMD way.