| Machine type | RISC-based distributed-memory multi-processor |
|---|---|
| Models | Fire 3800-15K |
| Operating system | Solaris (Sun's Unix flavour) |
| Connection structure | Crossbar (see remarks) |
| Compilers | Fortran 77, Fortran 90, HPF, C, C++ |
| Vendors information Web page | http://www.sun.com/servers/highend/sunfire15k/details.html |
| Year of introduction | 2001. |
System parameters:
| Model | Fire 3800-15K |
|---|---|
| Clock cycle | 900 MHz |
| Theor. peak performance | |
| Per proc. (64-bit) | 1.8 Gflop/s |
| Maximal (64-bit) | 190.8 Gflop/s |
| Main memory | <= 576 GB |
| Memory bandwidth | |
| No. of processors | <= 106 |
Remarks:
In the Fire 15K the processor/memory boards are plugged into a backplane that is an 18×18 flat crossbar. Each board contains four 900 MHz UltraSPARC III processors and a maximum of 32 GB of memory. So, normally the maximum number of processors would 72. However, the 15K in fact contains three of these 18×18 crossbars, for data, addresses, and signals. It is possible to sacrifice I/O capacity and use 17 of the 18 slots of the second crossbar to put in 2-CPU boards without local memory, adding another 34 processors to obtain the maximum of 106. Obviously, such a system is less balanced and such a configuration will normally only be chosen for very specific compute-intensive tasks with small I/O requirements. Because of the flat crossbar memory access is uniform and the aggregate bandwidth of the crossbar is 172.8 GB/s. This is equivalent to 2.4 GB/s/processor or 2.66B/cycle. So, an 8-byte operand needs 3 cycles to be shipped to the processor. Of course, for processors in excess to 72 that are not on the data backplane the situation is more complicated and it is hard to estimate what the effective bandwidth would be.
The Fire 15K is a typical SMP machine with provisions for shared-memory parallelism in the Fortran and C(++) compilers by directives in the source code. Sun has joined the OpenMP consortium for standardising the shared-memory programming model.
Measured Performances:
In [32] a speed of 357 Gflop/s is
reported for a 4-way cluster of 72 processor machines in solving a dense
linear system of unspecified size. The efficiency for this problem is 69%.