next up previous contents
Next: Recount of (almost) available systems Up: The Main Architectural Classes Previous: MIPS R14000A

Sun UltraSPARC III

The UltraSPARC-III is the third generation from the UltraSPARC family and, as one of the last RISC processor families, with full 64-bit precision and addressing range. It is built in 0.18 µm CMOS technology at a clock frequency that is currently 900 MHz. It is a complete revamp of earlier UltraSPARC designs but backward compatible with these older processors. UltraSPARCs are used in all SUN products from workstations to the heavy E10000 servers and also in Fujitsu products like the AP-3000. We show a block diagram of the UltraSPARC-III in Figure 14.

Block diagram of the UltraSPARC III
 processor
Figure 14: Block diagram of the UltraSPARC III processor.

The chip is characterised by large large amount of caches of various sorts as can be seen in the figure. The Data Cache Unit (DCU) contains apart from a 4-way set associative cache of 64 KB also a write and a pre-fetch cache, both of 2 KB. The pre-fetch cache is independent from the data cache and can load data when this is deemed appropriate. The write cache defers writes to the L2 cache and so may evade unnecessary writes of individual bytes until entire cache lines have to be updated. The Instruction Issue Unit (IIU) contains the 32 KB 4-way set associative instruction cache together with the instruction TLB which is called Instruction translation buffer in SUN's terminology. The IIU also contains a so-called miss queue that holds instructions that are immediately available for the execute units when a branch has been mis-predicted. Branch prediction is fully static in the UltraSPARC-III. It is implemented as a 16 KB table in the IIU that is pipelined because of its size.
The Integer Execute Unit (IEU) has two Add/Logical Units and a branch unit. Integer adds and multiplies are pipelined but the divide operation is not. It is performed by an Arithmetic Special Unit (not shown in the figure) that does not burden the pipelines for the ALUs. The integer register file is effectively divided in two and is called the Working and Architectural Register File by SUN. Operands are accessed and results stored in the working registers. When an exception occurs, the results to be undone in the working registers are overwritten by those from the architectural file.
The floating-point unit (FPU) has two independent pipelined units for addition and multiplication and a non-pipelined unit for floating division and square-root computation that require in the order of 20--25 cycles. The FPU also contains graphics hardware (not shown in Figure
14) that shares the pipelined adder and multiplier with general 64-bit calculations. For the chips delivered at 900 MHz, the theoretical peak performance is 1.8 Gflop/s. It is expected that the UltraSPARC-III technology can be shrunk to reach a clock frequency to 1 GHz by the end of its life cycle.
The memory controller and the L2 cache controller together with the L2 cache tags are all housed on the chip in the External Memory Unit. This shortens the latency of accesses from both memory levels. In addition, both controllers communicate with the System Interface Unit (SIU), also on-chip to keep in touch with the snoop pipe controller in the SIU. The processor has been built with multi-processing in mind and the snoop controller keeps track of data requests in the whole system to ensure coherency of the caches when required.
As the UltraSPARC-III is around for about a year at the time of writing and the clock frequency has gone up in that period from 750 to 900 MHz. The next generation will take some time (about a year) to appear and, after the radical redesign in the present generation, will have most of the same characteristics as the current one.


next up previous contents
Next: Recount of (almost) available systems Up: The Main Architectural Classes Previous: MIPS R14000A



Aad van der Steen
Mon Jul 29 14:17:51 MDT 2002