The computational power for the Hewlett Packard systems, like the SuperDome, the V-class, and N-class servers is delivered by the PA-8600 and PA-8700 chips. The processor cores of these chips are essentially the same. However, the PA-8700 is made in 0.18 µm logic which made it possible to fit a very large 0.75 MB instruction and a 1.5 MB data cache on the chip and to raise the clock frequency to 750 MHz. A block diagram of the PA-8700 chip is shown in 9.
A peculiarity of the PA-8x00 chips is the abcense of a secondary
cache. Instead, a very large primary cache is implemented: 0.75 MB
instruction cache and 1.5 MB data cache. From the PA8600 on the
shrinkingof the logic has allowed to put these caches on-chip. The
latency of the caches is two cycles. To ensure data to be shipped to
the registers every cycle, the load/store units work "out-of-phase".
So, one unit loads from one half of the data cache while the other
loads from the other half. The Address Reorder Buffer sets the priority
for the loads and tries to load from the alternate halfs every cycle.
Like all advanced RISC processors the PA-8700 has out-of-order
execution, the sequence of instructions being determined by the
instruction reorder buffer (IRB) which contains an ALU buffer that
drives the computational functional units and a memory buffer that
controls the load/store units. When speculative branches have been
mis-predicted the dependent instructions are retired from the IRB and
new candidate instructions replaced them. Branch prediction is
controlled through the branch history table (BHT) but, in addition to
this dynamic branch prediction, a static branch prediction can be
performed at the compiler level or by execution traces of former
executions of a program. The BHT was rather small in the predecessors of
the PA-8600 and has been enlarged significantly to get better prediction
results. Also the Translation Lookaside Buffer (a component of the
load/store units not shown in Figure 9)
has been enlarged for a more effective address translation. Also
there is a pre-fetch capability in the new PA-8700 from the data
cache.
As can be seen in Figure 9, there are 2 floating-point
units which each can deliver 2 flops per cycle but only when the
operation is in the axpy form x = x + a·y. This is called
a Floating Multiply Accumulate instruction (FMAC) by HP. At a clock
frequency of 550 MHz this leads to a theoretical peak performance of
3 Gflop/s. However, when the operations occur in another order or
with another composition, 1 flop per cycle per floating-point unit can
be executed with a correspondingly lower flop rate.
According to HP's roadmap at least another two generations of the
PA-8x00 are projected: PA-8800 and PA-8900 that will be on the market
concurrently with the IA-64 Itanium 2 (McKinley) and Itanium 3
(Deerfield), respectively. After that the PA-RISC family will be
withdrawn to give way to the IA-64 architecture.
Figure 9: Block diagram of a HP PA-RISC 8700 processor.
Next:
IBM POWER4
Up:
The Main Architectural Classes
Previous:
Compaq Alpha EV7