AMD Opteron

Introduction
HPC Architecture
  1. Shared-memory SIMD machines
  2. Distributed-memory SIMD machines
  3. Shared-memory MIMD machines
  4. Distributed-memory MIMD machines
  5. ccNUMA machines
  6. Clusters
  7. Processors
    1. AMD Opteron
    2. IBM POWER7
    3. IBM BlueGene/Q processor
    4. Intel Xeon
    5. The SPARC processors
  8. Accelerators
    1. GPU accelerators
      1. ATI/AMD
      2. nVIDIA
    2. General computational accelerators
      1. Intel Xeon Phi
    3. FPGA accelerators
      1. Convey
      2. Kuberre
      3. SRC
  9. Interconnects
    1. Infiniband
Available systems
  • The Bull bullx system
  • The Cray XC30
  • The Cray XE6
  • The Cray XK7
  • The Eurotech Aurora
  • The Fujitsu FX10
  • The Hitachi SR16000
  • The IBM BlueGene/Q
  • The IBM eServer p775
  • The NEC SX-9
  • The SGI Altix UV series
  • Systems disappeared from the list
    Systems under development
    Glossary
    Acknowledgments
    References

     

    AMD Interlagos

    All AMD processors are clones with respect to Intel's x86 Instruction Set Architecture. The 16-core Opteron variant called "Abu Dhabi" is no exception. It became available in November 2012. It is built with a feature size of 32 nm. The processor contains a 2-chip package for a total of 8 units, called "modules" by AMD, where each module has two integer units with 4 instruction pipelines each and two 128-bit floating-point units capable of executing 4 64-bit or 8 32-bit fused multiply-add operations each clock cycle. A diagram of a module is given in Figure 9.

    Block diagram of an AMD Abu Dhabi processor module.

    Figure 9: Block diagram of an AMD Abu Dhabi processor module.

    AMD mentions several reasons for this choice of functional units. One is that even for compute-intensive tasks on average more than 50% of the time is spent in integer operations. Also AMD prefers to double the integer units rather than relying on some form of software multithreading. AMD itself calls it Chip Multithreading (CMT) a term coined by SUN for its T2 multicore processors a few years ago. Another reason for this choice is to keep the power budget of the chip within reasonable bounds. The MMX units are able to execute both SSE4.1 and SSE4.2 instructions as well as Intel-compatible AVX vector instructions. In addition, these units are also capable of execution EAS instructions like already was possible in the latest Intel processors for fast en/decryption of data.
    The clock cycle of the fastest variant employed in HPC, the X6386SE, has a clock cycle of 2.8 GHz with a boost to 3.5 GHz in turbo mode. Because of the composition of the modules the instruction throughput may vary considerably depending on the workload. As said the Abu Dhabi chip harbours 8 modules as shown in Figure 10

    Block diagram of an AMD Abu Dhabi processor module.

    Figure 10: Block diagram of the AMD Abu Dhabi processor lay-out.

    Like in the former Interlagos processor HyperTransport 3.1 is used but the bandwidth to/from memory has increased from 28.8 GB/s to 40.3 GB/s due to the faster memory interface. As no independent and systematic workload experiments are available for this processor yet it is not possible to say how the new ideas for this major change in architecture work out in practice.

    The Abu Dhabi is an incremental improvement of the former Interlagos processor: branch prediction and data prefetching have been improved. The L1 TLB size has been doubled from 32 to 64 entries and also the L2 efficiency has improved. Furthermore, the energy efficiency is better than that of the earlier processor. Th performance gain is in the order of 10% over the Interlagos, while the energy efficiency has improved by at leat 10–20%.