Although we mainly want to discuss real, marketable systems and no experimental, special purpose, or even speculative machines, it is good to look ahead a little and try to see what may be in store for us in the near future. Below we discuss systems that may lead to commercial systems to be introduced on the market between somewhat more than half a year to a year from now. The commercial systems that result from it will sometimes deviate significantly from the original research models depending on the way the development is done (the approaches in Japan and the USA differ considerably in this respect) and the user group which is targeted. A development that was, at the time, of significance was the introduction of Intel's IA-64 Itanium processor family. Six vendors are offering Itanium 2-based systems at the moment and it is known that HP has ended the marketing of its Alpha and PA-RISC based systems in favour of the Itanium processor family at the moment (be it not for HPC purposes) and for this reason HP ended the marketing of its Alpha and PA-RISC based systems in favour of the Itanium processor family. Likewise SGI stopped the further development of MIPS processor based machines. This means that the processor base for HPC systems has become very narrow. However, the shock that was caused in the USA by the advent of the Japanese Earth Simulator system has helped in refueling the funding of alternative processor and computer architecture research of which we see the consequences in the last few years. In the section on accelerators section we already noted the considerable interest generated by systems that provide acceleration by means of FPGAs or other special computational accelerators like those from ClearSpeed, etc.Within the near future a HPC cannot afford not to include somehow such accelerators into their architectures. One also cannot expect general processor and HPC vendors to ignore this trend. In some way they will either integrate the emerging accelerator capability into their system (as is in the road maps of, e.g., Cray and SGI, see below), try to incorporate accelerating devices on the chips themselves (as seems the way AMD and Intel are going), or provide ways to thightly integrate accelerator hardware with a CPU and memory via a fast direct connection. We briefly review the status of these developments below.
AMDAMD took great pains the last few years to increase the number of standard general purpose x86_64 cores on its chips. The present number being 12 in the Magny-Cours processor. At the same time AMD acquired the GPU manufacturer ATI a few years back and it stands to reason that AMD somehow wants to combine the technologies of both branches. As it indeed plans to do in its future Fusion program. However, products of this type to be used in HPC servers are still some years away (in contrast to those for notebooks and desktop systems that may come around the second half of 2011). The first architectural change will be in the AMD Bulldozer chips that will feature 2 128-bit FMAC units for floating-point processing and 2×4 Integer units with 2 L1 caches and a shared L2 cache. In effect it looks much like a dual-core chip, be it that the chip resources are distributed in another way. At this moment it is impossible to make performance predictions for this new architecture which is radically different from the processors as produced lately by AMD. AMD will support the AVX instruction set for vector processing as defined by Intel.
Cray Inc.At the end of 2002 the next generation vector processor, the X1, from Cray Inc. was ready to ship. It built on the technology found in the Cray SV-1s. Cray widely publicises a roadmap of future systems as far as around 2010 primarily based on the Cascade project. This is the project that has started with help of DARPA's High Productivity Computer Systems initiative (HPCS) that has as one of its goals that 10 Pflop/s systems (sustained) should be available by 2010. This should not only entail the necessary hardware but also a (possibly new) language to productively program such systems. Cascade was Cray's answer to this initiative. Together with IBM Cray has continuing support from the HPCS program (HP, SGI, and SUN, respectively have dropped out).Cray seems reasonably on track with its Cascade project but it has done away with its former ideas of a very heterogeneous system that would integrate scalar and vector processors as in the abandoned XT5h and its FPGA-accelerated processor boards. However, Cray now plans to integrate nodes that can accomodate GPUs as most cluster vendors and, e.g., Bull are doing. So heterogeneity is creeping back in a different form. The follow-on systems bear imaginative names like "Baker" which is fact the Cray XE6, "Granite" and "Marble", ultimately leading to a system that should be able to deliver 10 Pflop/s sustained by 2011. In the systems following the XE6 a successor of the already fast Gemini router is expected, the Aries. Not much is known yet of this router, except that the connection to the processors will not be based on HyperTransport anymore but rather on PCI Express Gen3. This will give Cray the opportunity to use either AMD or Intel processors, whichever suits them best
Intel-based systemsAs mentioned before in this report, the Itanium (IA-64) line of processors has become irrelevant for the HPC area. Intel instead is focussing on its multi-core x86_64 line of general processors of which the Sandy Bridge will be the next generation with the server versions to come out in 2011 (consumer versions will turn up late 2010). The Sandy Bridge server version will have at least 8 cores and will feature the AVX instruction set for vector processing in units that are 256 bits wide.Intel is not insensitive with respect to the fast increase of the use of computational accelerators, GPUs in particular. An answer might be the many-core processors that are part of Intels future plans. The Larabee processor that was expected to become available in 2010 was retracted and instead Intel is exploring rather similar architectures, called Knight Ferry and, later on, Knights Corner, the latter planned to be the first official product where the former is presented as a development platform. Intel collectively calls this line of processors Many Integrated Core (MIC) processors. Like the retracted Larabee, the (heterogeneous) cores will be connected by a fast ring bus. In the Knights Ferry 32 CPUs with a feature size of 45 nm cores combined with 512-bit wide vector units operate at a clock frequency of about 2–2.5 GHz. The cores are connected by a 1024 bit wide ring. The Knights Corner will be built from 22 nm technology and will contain more than 50 cores. Whether this will be sufficient to ward off the adoption of GPUs as computational accelerators no one can predict for the moment.
SGINow that SGI has launched its Altix UV systems, it may be expected that it will continue in this direction. It is probable that, like Cray, next generations will adopt PCIe, Genx to connect the processors to each other and to the hub instead of QPI to become vendor-independent but no offical plans are mentioning such a transition. It is also possible that nodes will be offerred that contain GPUs along the general purpose processors in a node as it does in its Altix ICE clusters. There are at the moment no announcements in that direction, however.
Energy-efficient developmentsThere is already a considerable activity with regard to build Exaflop/s systems, foremost in the IESP (International Exascale Software Project, [14]. The time frame mentioned for appearance of such systems is 2019--2020. A main concern, however, is the power consumption of such machines. With the current technologies, even with improvements caused by reduced feature size factored in, would be roughly in the 120-150 MW range. I.e., the power consumption of a mid-sized city. This is obviously unacceptable and the power requirement circulating in the IESP discussions is ≤ 20 MW.This has fuelled the interest for alternative processor architectures like those used in embedded processors and at the moment a a reseach project, "Green Flash", is underway at Lawrence Berkeley Laboratory to build an Exaflops/s system using 20⋅106 Tensilica Xtensa embedded processors. Besides that a host of other maufacturers are now exploring this direction. Among them ARM, Texas Instruments, Adapteva, Tilera and many others. Although the processor is an important constituent in the power budget there are others that are becoming increasingly important. Among them are memory and the processor interconnect. Directions to decrease the power consumption of these components lie in transition to a form of non-volatile memory, i.e., memory that does not use energy to maintain its contents and the use of photonic interconnects. With respect to the latter development: the first step seems to have been set by IBM with the Blue Waters system (see section on IBM above). With regard to non-volatile memory there are a number of candidate technologies, like MRAM, RRAM and memristors. MRAM is already used in embedded applications but at an insufficient density to be applicable in present systems. RRAM and memristors may become available in a 2–3 year time frame. For storage, another major power consumer, SSDs in the form of Phase Change RAM may replace spinning disks in a few years. |