The BlueGene/Q processorThe BlueGene/Q processor, like its predecessors /L nd /P, uses a variant of the PowerPC family. This time of the A2 processor that was introduced last year by IBM as a network processor. The technology used is 45 nm SOI.Unlike the two earlier BlueGene processors this one is a full 64-bit processor. Where the BlueGene/P has 4 core/processor the BlueGene/Q processor contains no less than 18 cores. Sixteen of these are used for computation, one for OS tasks, and core 18 acts as a spare that in principle can be activated when one of the other cores fails, thus adding to the resiliency of the chip (although IBM made no promises that it actually will be a hot spare). In Figure 14 a scheme of the chip layout is given. In contrast to the earlier BlueGene models there is now only one type of network: a 5-D torus. The connections to the outside world are too complicated to depict in this diagram and further packaging details are deferred to the discussion of the system in the section on the IBM BlueGene system. ![]() Figure 14: Block diagram of an IBM BlueGene/Q processor chip. The crossbar operates at 800 MHz and has a respectable bisection bandwidth of 563 GB/s, although it is not sufficient to feed all cores simultaneously. ![]() Figure 15: Block diagram of an IBM BlueGene/Q processor core. The processor cores operate at a clock speed of 1.6 GHz, almost double that of the former BlueGene/P. As can be seen from the diagram in Figure 15 4 FMA instructions can be executed per core, thus delivering a 12.8 Gflop/s peak speed per core. Per processor 16 GB of DDR3-1333 MHz memory is available, two times more per core than for the BlueGene/P. The data are fetched/stored through two memory controllers, each servicing 8 cores at an aggregate bandwidth of 47.2 GB/s. The core is capable of 4-way instruction issue, both for integer and load/store processing and for the floaing-point units. Instruction processing is however in-order, unlike in the former BlueGenes. For the floating-point units this potential bottleneck is mitigated by u Permute Unit that enables reordering strided and otherwise permuted operands to be processed in an SIMD way. |