The Cray Inc. SV1ex.

Next: The Cray Inc. T3E. Up: Recount of (almost) available ... Previous: The Compaq AlphaServer SC series.

The Cray Inc. SV1ex.

Machine type Shared-memory multi-vector processor.
Models SV1ex-1A, SV1ex-1, SV1ex-4 (cluster).
Operating system UNICOS (Cray Unix variant).
Connection structure Crossbar.
Compilers Fortran 90, C, C++, Pascal, ADA.
Vendors information Web page www.cray.com/products/systems/craysv1/
Year of introduction 2000.

Machine type	Shared-memory multi-vector processor.
Models	SV1ex-1A, SV1ex-1, SV1ex-4 (cluster).
Operating system	UNICOS (Cray Unix variant).
Connection structure	Crossbar.
Compilers	Fortran 90, C, C++, Pascal, ADA.
Vendors information Web page	www.cray.com/products/systems/craysv1/
Year of introduction	2000.

System parameters:

Model Cray SV1ex-1A Cray SV1ex-1 Cray SV1ex-4
Clock cycle 500 MHz 500 MHz 500 MHz
Theor. peak performance
Per Proc. (64 bits) 2/8 Gflop/s 2/8 Gflop/s 2/8 Gflop/s
Maximal 32 Gflop/s 64 Gflop/s 256 Gflop/s
Memory <= 32 GB <= 96 GB <= 384 GB
No. of processors 8--16 8--32 32--128
Memory bandwidth
Memory-Cache 6.4 GB/s 6.4 GB/s 6.4 GB/s
Cache-CPU 14.4 GB/s 14.4 GB/s 14.4 GB/s
Aggregate 25.6 GB/s 51.2 GB/s 204.8 GB/s

Model	Cray SV1ex-1A	Cray SV1ex-1	Cray SV1ex-4
Clock cycle	500 MHz	500 MHz	500 MHz
Theor. peak performance
Per Proc. (64 bits)	2/8 Gflop/s	2/8 Gflop/s	2/8 Gflop/s
Maximal	32 Gflop/s	64 Gflop/s	256 Gflop/s
Memory	<= 32 GB	<= 96 GB	<= 384 GB
No. of processors	8--16	8--32	32--128
Memory bandwidth
Memory-Cache	6.4 GB/s	6.4 GB/s	6.4 GB/s
Cache-CPU	14.4 GB/s	14.4 GB/s	14.4 GB/s
Aggregate	25.6 GB/s	51.2 GB/s	204.8 GB/s

Remarks:

The Cray SV1ex series is a "midlife kicker" that bridges the gap between the Cray SV1 that appeared in 1998 and the SV2 which is expected to appear in 2002. Essentially the SV1ex machines are identical to the SV1s, however, the clock frequency has been raised by 50\%. This speeds up the single-processor peak performance from 1.2 to 1.8 Gflop/s. Furthermore, the speed of memory has increased by a factor of two which respect to the SV1.

The Cray SV1(ex) is the successor both to the CMOS-based Cray J90 and the Cray T90 which was based on ECL technology. The SV1ex systems are CMOS-based and therefore much cheaper to manufacture than the ECL-based systems. In this respect it has followed the trend set in by Fujitsu and NEC a few years ago with their vector systems (see the Fujitsu VPP5000 and the NEC SX-6). The Cray vector processor tradition has also been followed in that the SV1ex series uses its own Cray-specific floating-point format instead of the IEEE 754 standard.

The single-cabinet configurations come in two sizes, the SV1ex-1A and the SV1ex-1 that can house 4 and 8 processor boards, respectively. Each processor board contains 4 CPUs that can deliver a peak rate of 4 floating-point operations per cycle, amounting to a theoretical peak performance of 2 Gflop/s per CPU. However, 4 CPUs can be coupled across CPU boards in a configuration to form a so-called Multi Streaming Processor (MSP) resulting in a processing unit that has effectively a Theoretical Peak Performance of 8 Gflop/s. The reconfiguration into MSPs and/or single CPU combinations can be done dynamically as the workload dictates. The vector start-up time for the single CPUs is smaller than for MSPs, so for small vectors single CPUs might be preferable while for programs containing long vectors the MSPs should be of advantage. The number of combinations that can be made is large but at least 8 CPUs must be configured as single 2 Gflop/s CPUs. So a full SV1ex-1 cabinet may be configured as 32 single 2 Gflop/s CPUs or as 1--6 MSPs with the remaining processors as single CPUs.

Another feature in the SV1ex is a combined scalar and vector cache of 256 KB per CPU. This cache is important because the bandwidth of 6.4 GB/s per CPU board amounts to only 1.5 eight-byte operands per cycle. The cache can ship 4 operands per cycle to a CPU. This relative bandwidth is much smaller than what was offered in the former Cray systems which makes the cache all the more important. As the available bandwidth from a memory interface is divided over the 4 processors on a board on an as-needed basis and it is assumed that not all processors require the maximum amount of data all the time the average data requirement of the processor boards is hoped to be met.

Like in the NEC SX-6 single cabinets can be combined to form a cluster (Supercluster in Cray's terminology) by a so-called GigaRing. The GigaRing, which is also used to couple I/O sub-systems, is comprised of two counter-rotating rings with a bandwidth of 1 GB/s each. Where the systems in a cabinet are SM-MIMD systems, a multi-cabinet Supercluster is an DM-MIMD system and can be operated in parallel only by some parallel programming model like MPI or HPF. The SV1ex-4 is a standard configuration that is offered by Cray Inc. but larger clusters with up to 32 SV1ex-1 nodes are also possible.

Measured Performances: In [6] a performance of 48.17 Gflop/s is reported for solving a dense linear system of size 40,320 on a 32-processor machine. This amounts to an efficiency of 75.3%.

Next: The Cray Inc. T3E. Up: Recount of (almost) available ... Previous: The Compaq AlphaServer SC series.

Aad van der Steen
Mon Jul 29 15:18:30 MDT 2002