Systems under development

Introduction
HPC Architecture
  1. Shared-memory SIMD machines
  2. Distributed-memory SIMD machines
  3. Shared-memory MIMD machines
  4. Distributed-memory MIMD machines
  5. ccNUMA machines
  6. Clusters
  7. Processors
    1. AMD Opteron
    2. IBM POWER5+
    3. IBM BlueGene processor
    4. Intel Itanium 2
    5. Intel Xeon
    6. The SPARC processors
  8. Networks
    1. Infiniband
    2. InfiniPath
    3. Myrinet
    4. QsNet
    5. SCI
Available systems
  1. The Bull NovaScale
  2. The C-DAC PARAM Padma
  3. The Cray X1E
  4. The Cray XD1
  5. The Cray XT3
  6. The Fujitsu/Siemens PRIMEPOWER
  7. The Fujitsu/Siemens PRIMEQUEST
  8. The Hitachi BladeSymphony
  9. The Hitachi SR11000
  10. The HP Integrity Superdome
  11. The IBM eServer p575
  12. The IBM BlueGene/L
  13. The NEC Express5800/1000
  14. The NEC SX-8
  15. The SGI Altix 4000
  16. The SunFire E25K
Systems disappeared from the list
Systems under development
Glossary
Acknowledgements
References

Although we mainly want to discuss real, marketable systems and no experimental, special purpose, or even speculative machines, we want to include a section on systems that are in a far stage of development and have a fair chance of reaching the market. For inclusion in section 3 we set the rule that the system described there should be on the market within a period of 6 months from announcement. The systems described in this section will in all probability appear within one year from the publication of this report. However, there are vendors who do not want to disclose any specific data on their new machines until they are actually beginning to ship them. We recognise the wishes of such vendors (it is generally wise not to stretch the expectation of potential customers too long) and they will not disclose such information.

Below we discuss systems that may lead to commercial systems to be introduced on the market between somewhat more than half a year to a year from now. The commercial systems that result from it will sometimes deviate significantly from the original research models depending on the way the development is done (the approaches in Japan and the USA differ considerably in this respect) and the user group which is targeted.

A development that has shown to be of significance is the introduction of Intel's IA-64 Itanium processor family. Already four vendors are offering Itanium 2-based systems at the moment and it is known that HP will end the marketing of its Alpha and PA-RISC based systems in favour of the Itanium processor family. At the same time SGI will stop the further development of MIPS processor based machines. This means that in a few years only AMD, IBM, Intel, and SUN will produce RISC-like processors for HPC systems. This will make the HPC system field much less diverse and interesting. On the other hand, the shock that was caused in the USA by the advent of the Japanese Earth Simulator system may help in refueling the funding of alternative processor and computer architecture research. Indeed, some initiatives in that direction are already under way but these will not bear real new results in one or two years (except maybe with the IBM Blue Gene, see below).

Cray Inc.

In the end of 2002 the next generation vector processor, the X1, from Cray Inc. was ready to ship. It built on the technology found in the Cray SV-1s. Cray widely publicises a roadmap of future systems as far as around 2010. It remains to be seen how much can be realised, however, at least 2 of these systems are have reached the market: first the Cray X1E, which is nothing more than the present X1 system in which the clock cycle of the processor is raised from 800 MHz to 1.2 GHz. The other one is the Cray XT3, a commercialised version of the AMD Opteron-based (11,648 processors) Red Storm machine that is presently built by Cray for Sandia Laboratories. There is much interest for this type of system because of the cheap basic processor and the fast network based on AMD's HyperTransport and Cray's SeaStar router ASIC.

Further away lies the BlackWidow a follow on to the X1E, scheduled for 2007. The system will have a new type of infrasturucture, code name "Rainier". As the same infrastructure will be used for the next generation XT3 it is possible to combine vectorprocessors and AMD processors in one configuration and let them share common resources like I/O processors. Most likely such a system cannot operate yet as one integrated system in the first phase but this is the ultimate goal to be reached in the Cascade system in 2010. In this system it must be possible to freely add either type of processor and FPGAs and massively multi-threaded processors as well.

The latter type of processor is still developed by Cray as a follow-on to the former MTA-2 systems of which very few were actually sold. Still, there is considerable interest in this processor architecture, especially in national security circles. The MTA-3, code name Eldorado, will have the macro-architecture of the Cray XT3 but with the AMD processors on the board replaced by MTA processor at a clock frequency of 500 MHz instead of the 200 MHz in the MTA-2. At the time of writing prototypes exist but the product is kept low-profile and it is not known presently whether it will become a publicly marketed product.

Hewlett-Packard/Intel

HP and Intel will have a great influence in the next few years with their Itanium processor family. A dual core processor based on the Itanium is already on (or beyond) the drawing board and will hit the market in a year or two. As dual core processors usually have a relatively poorer performance than their single core equivalents, the performance improvement will not be spectacular. The system architecture will be much more important. Also a diversification of the processors themselves may help to boost the performance. Because of HP/Intel's experience with VLIW processors (as the Itanium essentially is), one might expect that the research will go in the direction of processors with even longer instruction words and possibly including specialised devices for high level operations like FFTs or sparse Matrix-Vector multiplies as well. When and how such improvements would turn up in future systems is however speculative. It will certainly not happen within the next two years. As yet no radically different system architectures are known to be on HP's drawing boards. Instead it may try to penetrate more in the cluster field were it already has installed some large Itanium-based systems.

IBM

IBM has been working for some years on its Blue Gene system. Of which the first of model, the Blue Gene/L, has been installed now in various places. Other Blue Gene follow-ups are planned called the Blue Gene/P with a peak speed of 1 Pflop/s and the Blue Gene/Q with a peak speed of 3 Pflop/s, respectively. All these systems are hardly meant for the average HPC user but they may help in finding suitable architectural features for systems for the general market.

Of course the development of the POWERx processors also will make its mark: the POWER5+ processor will have the usual technology-related advantages over its predecessor, and now the first prototypes of the POWER6 processor are being tested. When this processor becomes generally available the clock frequency is expected to be 3.5--4 GHz, yielding a processor with a performance of 14--16 Gflop/s/core. Futhermore, it is a subject of research how to couple 8 of them such that a virtual vector processor with a peak speed of around 120 Gflop/s can be made. This approach is called the ViVA (Virtual Vector Architecture). It is reminiscent of Hitachi's SR8000 processors (which used POWER5 processors) or the MSP processors in the Cray X1E. This road will take some years to go also after the POWER6 processor has become available and will extend to the next generation(s) of the POWERx.

In addition, the cell processor, developed with Sony might become a factor in HPC systems. This processor has 8 computational cores and a control processor. Although it is in first instance targetted at the gaming industry (hence Sony's interest) numerical experiments with the processor proved it to be extremely performant in this area, be it in 32-bit precision. Future generations could however be adapted to numerically intensive work. When it is possible to maintain similar performance characteristics it could become an important building block for HPC systems.

SGI

Two years ago SGI made it known that it would stop producing its MIPS-based systems. Therefore, the difference they would like to make with respect to other vendors that also offer Itanium-based systems would have to lie in the macro-architecture of their systems. Improvements will be realised in the speed of the network (NUMALink3 to NUMAlink4 and beyond) and systems with a large amount of processors in a single system image (SSI). In that respect SGI has a track record with its MIPS-based Origin 3000 systems which may be extended for its future Altix x000 systems where at present SSIs of 512 are realised.

Further in the future SGI seems to have plans that are more or less similar to Cray's Cascade project: coupling of heterogeneous processor sets through its proprietary network, in this case a successor of the NUMAlink4 network architecture. A first step in that direction is the availability of the so-called RASC blades that can be put into the Altix 4700 infrastructure. Each RASC blade features 2 FPGAs that can be used as computational accelerators for certain algorithms in applications. The idea is to further diversify the future systems, ultimately into a system with the codename "Ultraviolet". Development of such systems is very costly, so it remains to be seen whether such plans will pass the stage of intentions in regard of the present difficult financial position of SGI.

Sun

Like Cray and IBM Sun has been awarded a grant from DARPA to develop so-called high-productivity systems in DARPA's HPCS program. Up till now Sun has concentrated on developing heavily multi-threaded processors, the first product being the Niagara chip and the next, the still more multi-threaded Rock processor. The first implementation of the Niagara processor is a ready and now called the T2000 processor. It is however not meant for computational work but rather as a throughput Web or I/O server.The chip harbours 8 CPU cores of which each core is 4-way multi-threaded. Systems based on the Rock processor would not be available before 2008. For its near future mainstream high-end systems Sun will therefore rely on Fujitsu's SPARC64 processors.