|
AMD InterlagosAll AMD processors are clones with respect to Intel's x86 Instruction Set Architecture. The 16-core Opteron variant called "Abu Dhabi" is no exception. It became available in November 2012. It is built with a feature size of 32 nm. The processor contains a 2-chip package for a total of 8 units, called "modules" by AMD, where each module has two integer units with 4 instruction pipelines each and two 128-bit floating-point units capable of executing 4 64-bit or 8 32-bit fused multiply-add operations each clock cycle. A diagram of a module is given in Figure 9.
Figure 9: Block diagram of an AMD Abu Dhabi processor module.
AMD mentions several reasons for this choice of functional units. One is that
even for compute-intensive tasks on average more than 50% of the time is spent
in integer operations. Also AMD prefers to double the integer units rather than
relying on some form of software multithreading. AMD itself calls it Chip
Multithreading (CMT) a term coined by SUN for its T2 multicore processors a
few years ago. Another reason for this choice is to keep the power budget of the
chip within reasonable bounds. The MMX units are able to execute both SSE4.1 and
SSE4.2 instructions as well as Intel-compatible AVX vector instructions. In
addition, these units are also capable of execution EAS instructions like
already was possible in the latest Intel processors for fast en/decryption of
data.
Figure 10: Block diagram of the AMD Abu Dhabi processor lay-out. Like in the former Interlagos processor HyperTransport 3.1 is used but the bandwidth to/from memory has increased from 28.8 GB/s to 40.3 GB/s due to the faster memory interface. As no independent and systematic workload experiments are available for this processor yet it is not possible to say how the new ideas for this major change in architecture work out in practice. The Abu Dhabi is an incremental improvement of the former Interlagos processor: branch prediction and data prefetching have been improved. The L1 TLB size has been doubled from 32 to 64 entries and also the L2 efficiency has improved. Furthermore, the energy efficiency is better than that of the earlier processor. Th performance gain is in the order of 10% over the Interlagos, while the energy efficiency has improved by at leat 10–20%. |