| Machine type | RISC-based SMP system. |
|---|---|
| Models | AlphaServer GS80, GS160, GS320. |
| Operating system | Tru64 Unix (Compaq's flavour of Unix). |
| Connection structure | Variable (see remarks) |
| Compilers | Fortran 77, Fortran 90, HPF, C, C++. |
| Vendors information Web page | http://www.digital.com/products/quickspecs/10643_na/10643_na.html |
| Year of introduction | 1999. |
System parameters:
| Model | GS80 | GS160 | GS320 |
|---|---|---|---|
| Clock cycle | 1 GHz | 1 GHz | 1 GHz |
| Theor. peak performance | |||
| Per Proc. (Gflop/s) | 2 | 2 | 2 |
| Maximal (Gflop/s) | 16 | 32 | 64 |
| Memory | <= 64 GB | <= 128 GB | <= 256 GB |
| No. of processors | <= 8 | <= 16 | <= 32 |
| Memory bandwidth | |||
| Processor/Memory | 1.75 GB/s | 1.75 GB/s | 1.75 GB/s |
| Aggregate bandwidth | 14.3 GB/s | 28.5 GB/s | 57 GB/s |
Remarks:
The GS series is a family of SMP servers with currently the fastest
Alpha 21264 processor available at 1 GHz. The systems are build from
``Quad Building Blocks'' (QBBs), blocks of 4 processors. The GS80 can
house 2 of these blocks, while the largest configuration, the GS320 has
up to 32 processsors in 8 QBBs. The processors in a QBB have access to
the memory via a crossbar with an aggregate bandwidth of 7.0 GB/s. This
means that for each individual processor the bandwidth is 1.75 GB/s or
slightly more than a quarter of an 8-byte operand per cycle. The
QBBs are again connected by a crossbar with the same bandwidth which
amounts to an aggregate bandwidth of 57 GB/s for the largest GS
configuration.
Because of their SMP character, users can employ OpenMP for
shared-memory parallelisation on the GS systems to up to 32 processors
in the GS320. Of course also MPI can be used along with the full range
of Compaq compilers.
Measured Performances: In [6] a performance of 47.1 Gflop/s is given for a 32-processor GS320 system in solving linear system of order 40,000. An efficiency of 73.5%. Moreover ES40-based GS320's at a clock frequency of 731 MHZ have been 2-way and 4-way clustered which yielded speeds of 63.8 and 87.5 Gflop/s, respectively. As the internode bandwidth of the clusters markedly less, the efficiencies dropped accordingly to 68.2 and 46.7% respectively.