Hewlett-Packard PA-RISC 8700

Next: IBM POWER4 Up: The Main Architectural Classes Previous: Compaq Alpha EV7

Hewlett-Packard PA-RISC 8700

The computational power for the Hewlett Packard systems, like the SuperDome, the V-class, and N-class servers is delivered by the PA-8600 and PA-8700 chips. The processor cores of these chips are essentially the same. However, the PA-8700 is made in 0.18 µm logic which made it possible to fit a very large 0.75 MB instruction and a 1.5 MB data cache on the chip and to raise the clock frequency to 750 MHz. A block diagram of the PA-8700 chip is shown in 9.

Figure 9: Block diagram of a HP PA-RISC 8700 processor.

A peculiarity of the PA-8x00 chips is the abcense of a secondary cache. Instead, a very large primary cache is implemented: 0.75 MB instruction cache and 1.5 MB data cache. From the PA8600 on the shrinkingof the logic has allowed to put these caches on-chip. The latency of the caches is two cycles. To ensure data to be shipped to the registers every cycle, the load/store units work "out-of-phase". So, one unit loads from one half of the data cache while the other loads from the other half. The Address Reorder Buffer sets the priority for the loads and tries to load from the alternate halfs every cycle.

Like all advanced RISC processors the PA-8700 has out-of-order execution, the sequence of instructions being determined by the instruction reorder buffer (IRB) which contains an ALU buffer that drives the computational functional units and a memory buffer that controls the load/store units. When speculative branches have been mis-predicted the dependent instructions are retired from the IRB and new candidate instructions replaced them. Branch prediction is controlled through the branch history table (BHT) but, in addition to this dynamic branch prediction, a static branch prediction can be performed at the compiler level or by execution traces of former executions of a program. The BHT was rather small in the predecessors of the PA-8600 and has been enlarged significantly to get better prediction results. Also the Translation Lookaside Buffer (a component of the load/store units not shown in Figure 9) has been enlarged for a more effective address translation. Also there is a pre-fetch capability in the new PA-8700 from the data cache.

As can be seen in Figure 9, there are 2 floating-point units which each can deliver 2 flops per cycle but only when the operation is in the axpy form x = x + a·y. This is called a Floating Multiply Accumulate instruction (FMAC) by HP. At a clock frequency of 550 MHz this leads to a theoretical peak performance of 3 Gflop/s. However, when the operations occur in another order or with another composition, 1 flop per cycle per floating-point unit can be executed with a correspondingly lower flop rate.

According to HP's roadmap at least another two generations of the PA-8x00 are projected: PA-8800 and PA-8900 that will be on the market concurrently with the IA-64 Itanium 2 (McKinley) and Itanium 3 (Deerfield), respectively. After that the PA-RISC family will be withdrawn to give way to the IA-64 architecture.

Next: IBM POWER4 Up: The Main Architectural Classes Previous: Compaq Alpha EV7

Aad van der Steen
Mon Jul 29 12:57:12 MDT 2002