next up previous contents
Next: Systems disappeared from the list. Up: Recount of (almost) available ... Previous: The SGI Origin 3000 series.

The Sun Fire 3800-15K.

Machine type RISC-based distributed-memory multi-processor
Models Fire 3800-15K
Operating system Solaris (Sun's Unix flavour)
Connection structure Crossbar (see remarks)
Compilers Fortran 77, Fortran 90, HPF, C, C++
Vendors information Web page http://www.sun.com/servers/highend/sunfire15k/details.html
Year of introduction 2001.

System parameters:

Model Fire 3800-15K
Clock cycle 900 MHz
Theor. peak performance
Per proc. (64-bit) 1.8 Gflop/s
Maximal (64-bit) 190.8 Gflop/s
Main memory <= 576 GB
Memory bandwidth
No. of processors <= 106

Remarks:

In the Fire 15K the processor/memory boards are plugged into a backplane that is an 18×18 flat crossbar. Each board contains four 900 MHz UltraSPARC III processors and a maximum of 32 GB of memory. So, normally the maximum number of processors would 72. However, the 15K in fact contains three of these 18×18 crossbars, for data, addresses, and signals. It is possible to sacrifice I/O capacity and use 17 of the 18 slots of the second crossbar to put in 2-CPU boards without local memory, adding another 34 processors to obtain the maximum of 106. Obviously, such a system is less balanced and such a configuration will normally only be chosen for very specific compute-intensive tasks with small I/O requirements. Because of the flat crossbar memory access is uniform and the aggregate bandwidth of the crossbar is 172.8 GB/s. This is equivalent to 2.4 GB/s/processor or 2.66B/cycle. So, an 8-byte operand needs 3 cycles to be shipped to the processor. Of course, for processors in excess to 72 that are not on the data backplane the situation is more complicated and it is hard to estimate what the effective bandwidth would be.

The Fire 15K is a typical SMP machine with provisions for shared-memory parallelism in the Fortran and C(++) compilers by directives in the source code. Sun has joined the OpenMP consortium for standardising the shared-memory programming model.

Measured Performances:
In [32] a speed of 357 Gflop/s is reported for a 4-way cluster of 72 processor machines in solving a dense linear system of unspecified size. The efficiency for this problem is 69%.



next up previous contents
Next: Systems disappeared from the list. Up: Recount of (almost) available ... Previous: The SGI Origin 3000 series.

Aad van der Steen
Tue Jul 30 09:18:55 MDT 2002