Although we mainly want to discuss real, marketable systems and no experimental, special purpose, or even speculative machines, we want to include a section on systems that are in a far stage of development and have a fair chance of reaching the market. For inclusion in section 3 we set the rule that the system described there should be on the market within a period of 6 months from announcement. The systems described in this section will in all probability appear within one year from the publication of this report. However, there are vendors who do not want to disclose any specific data on their new machines until they are actually beginning to ship them. We recognise the wishes of such vendors (it is generally wise not to stretch the expectation of potential customers too long) and they will not disclose such information.
Below we discuss systems that may lead to commercial systems to be introduced on the market between somewhat more than half a year to a year from now. The commercial systems that result from it will sometimes deviate significantly from the original research models depending on the way the development is done (the approaches in Japan and the USA differ considerably in this respect) and the user group which is targeted.
A development that has shown to be of significance is the introduction of Intel's IA-64 Itanium processor family. Already four vendors are offering Itanium 2-based systems at the moment and it is known that HP will end the marketing of its Alpha and PA-RISC based systems in favour of the Itanium processor family. At the same time SGI will stop the further development of MIPS processor based machines. This means that in a few years only AMD, IBM, Intel, and SUN will produce RISC-like processors for HPC systems. This will make the HPC system field much less diverse and interesting. On the other hand, the shock that was caused in the USA by the advent of the Japanese Earth Simulator system may help in refueling the funding of alternative processor and computer architecture research. Indeed, some initiatives in that direction are already under way but these will not bear real new results in one or two years (except maybe with the IBM Blue Gene, see below).
In the end of 2002 the next generation vector processor, the X1, from Cray Inc.
was ready to ship. It built on the technology found in the Cray SV-1s.
Cray
widely publicises a roadmap of future systems as far as around 2010. It remains
to be seen how much can be realised, however, at least 2 of these systems are
certain to reach the market in 2005: first the Cray X1e, which is nothing more
than the present X1 system in which the clock cycle of the processor is raised
from 800 MHz to 1.2 GHz. The other one is the Cray Strider, a commercialised
version of the AMD Opteron-based (11,648 processors) Red Storm machine that is
presently built by Cray for Sandia Laboratories. There is much interest for
this type of system because of the cheap basic processor and the fast network
based on AMD's HyperTransport and Cray's SeaStar router ASIC.
Further away lies the Black Widow a follow on to the X1e, scheduled for 2006.
Recently plans for a new type of system have been disclosed, code name
“Rainier”. In this system the inter-processor network is the
central part and in this network nodes of different type can be mixed, vector
type (Black Widow), scalar type (AMD or other), and FPGAs/DSPs for special
functions. An upgraded form of the Red Storm network seems a very good basis
for such a system but the realisation is still some time away and undoubtly
some technical challenges will be met.
HP and Intel will have a great influence in the next few years with their Itanium processor family. A dual core processor based on the Itanium is already on (or beyond) the drawing board and will hit the market in a year or two. As dual core processors usually have a relatively poorer performance than their single core equivalents, the performance improvement will not be spectacular. The system architecture will be much more important. Also a diversification of the processors themselves may help to boost the performance. Because of HP/Intel's experience with VLIW processors (as the Itanium essentially is), one might expect that the research will go in the direction of processors with even longer instruction words and possibly including specialised devices for high level operations like FFTs or sparse Matrix-Vector multiplies as well. When and how such improvements would turn up in future systems is however speculative. It will certainly not happen within the next two years. As yet no radically different system architectures are known to be on HP's drawing boards. Instead it may try to penetrate more in the cluster field were it already has installed some large Itanium-based systems.
IBM is already working for some years on its Blue Gene system. The first of these systems, the Blue Gene/L is likely to appear somewhere in 2004. The system will come with 65,536 processors at a 700 MHz clock frequency and two floating-point units/processor. As each floating-point unit should be able to do a fuse multiply-add (as in the current POWER4), the system should have a Theoretical Peak Performance of 183.5 Tflop/s. Other Blue Gene follow-ups are planned called the Blue Gene/P with a peak speed of 1 Pflop/s, and the Blue Gene/Q with a peak speed of 3 Pflop/s, respectively. All these systems are hardly meant for the average HPC user but they may help in finding suitable architectural features for systems for the general market.
Of course the development of the POWER4 processors also will make its mark: the POWER5 processor will have the usual technology-related advantages over its predecessor, but also it is considered how to configure them such that by coupling eight of them a virtual vector processor with a peak speed of 60--80 Gflop/s can be made. This approach is called the ViVA (Virtual Vector Architecture). It is reminiscent of the Hitachi's SR8000 processors or the MSP processors in the Cray X1. This road will take some years to go even after the POWER5 processor itself will become available in 2004/2005.
Recently it has become known that SGI will stop producing its MIPS-based systems. Therefore, the difference they would like to make with respect to other vendors that also offer Itanium-based systems would have to lie in the macro-architecture of their systems. Improvements can be realised in the speed of the network (Numalink3 to Numalink4 and beyond) and systems with a large amount of processors in a single system image. In that respect SGI has a track record with its MIPS-based Origin 3000 systems which may be extended for its future Altix x000 systems.
The SRC company represents a trend that is taking up remarkable speed at the moment. It consists of complementing general purpose processors with (a collection of) FPGAs, Field Programmable Gate Arrays, see also the glossary. This makes it possible to configure such a machine for special user-defined tasks that would them make, at least in principle, significantly faster than general purpose processors for the same tasks. SRC proposes a system, the SRC-6, with 256 dual processor boards containing standard IA-32 processor each of which is connected to a unit, MAP, for Multi-Adaptive Processor, that consists of an FPGA, private memory, a MAP controller, and logic to reconfigure the unit when needed. MAPs are interconnected by a ring network and the standard processor boards that also have their local memory, are through the MAPs connected to a global memory by a read and a write crossbar.