The Fujitsu/Siemens M9000

Introduction
HPC Architecture
  1. Shared-memory SIMD machines
  2. Distributed-memory SIMD machines
  3. Shared-memory MIMD machines
  4. Distributed-memory MIMD machines
  5. ccNUMA machines
  6. Clusters
  7. Processors
    1. AMD Opteron
    2. IBM POWER6
    3. IBM PowerPC 970
    4. IBM BlueGene processors
    5. Intel Itanium 2
    6. Intel Xeon
    7. The MIPS processor
    8. The SPARC processors
  8. Accelerators
    1. GPU accelerators
    2. General accelerators
    3. FPGA accelerators
  9. Networks
    1. Infiniband
    2. InfiniPath
    3. Myrinet
    4. QsNet
Available systems
  1. The Bull NovaScale
  2. The C-DAC PARAM Padma
  3. The Cray XT3
  4. The Cray XT4
  5. The Cray XT5h
  6. The Cray XMT
  7. The Fujitsu/Siemens M9000
  8. The Fujitsu/Siemens PRIMEQUEST
  9. The Hitachi BladeSymphony
  10. The Hitachi SR11000
  11. The HP Integrity Superdome
  12. The IBM BlueGene/L&P
  13. The IBM eServer p575
  14. The IBM System Cluster 1350
  15. The Liquid Computing LiquidIQ
  16. The NEC Express5800/1000
  17. The NEC SX-9
  18. The SGI Altix 4000
  19. The SiCortex SC series
  20. The Sun M9000
Systems disappeared from the list
Systems under development
Glossary
Acknowledgments
References

Machine type RISC-based shared-memory multi-processor
Models M9000-32, M9000-64
Operating system Solaris (Sun's Unix variant)
Connection structure Crossbar
Compilers Parallel Fortran 90, OpenMP, C, C++
Vendors information web page: http://www.fujitsu-siemens.com/products/unix_servers/sparc_enterprise/sparcent_enterprise.html
Year of introduction 2007

System parameters:

Model M9000-32 M9000-64
Clock cycle 2.52 GHz 2.52 GHz
Theor. peak performance    
Per core (64-bits) 10.1 Gflop/s 10.1 Gflop/s
Maximal 1.29 Tflop/s 2.58 Tflop/s
Main memory    
Memory/node ≤ 128 GB ≤ 128 GB
Memory/maximal ≤ 1 TB ≤ 2 TB
No. of processor cores 8—64 8—128
Communication bandwidth    
Point-to-point ≥ 8 GB/s ≥ 8 GB/s
Aggregate 367.5 GB/s 737 GB/s

Remarks

We only discuss here the M9000-32 and M9000-64 as the smaller models like the M8000s have the same structure but less processors. We also mention here that the same models are available with a somewhat slower dual-core processor at 2.28 and 2.4 GHz. The M9000 systems now represent the high-end servers of Fujitsu/Siemens and Sun and as such replace both the Fujitsu/Siemens PRIMEPOWER series as well as Sun's E25K server (see Systems disappeared from the list).

The quad-core SPARC64 VII processors (see the section on the SPARC processor) have a theoretical peak speed of 10.08 Gflop/s/core and are packaged in four-processor CPU Memory Units (CMUs). Apart from the four processors a CMU also houses a part of the total memory, up to 128 GB/CMU. All components of a CMU, CPUs and memory controllers are directly connected to each other in a crossbar fashion. The CMUs, each residing on one board, are in turn again connected by a crossbar, connecting 8 or 16 of them in the M9000/32 and M9000/64, respectively. The M9000-64 is a 2-cabinet version of the M9000/32.

The system interconnect is called the Jupiter bus by Fujitsu. It connects via the crossbar both to the CPU and the I/O boards. From the information provided by Fujitsu the point-to-point bandwidth cannot be exactly derived but should be more than 8 GB/s. The aggregate bandwidth is however stated for both configurations. Because of the structure of an CMU the memory access will be uniform between CPUs but it is not clear this also the case for memory access from other boards. Fujitsu does not state a NUMA factor for the systems although it is highly probable memory access is non-uniform within the entire system. From other sources it can be gathered that with respect to the earlier PRIMEPOWER series the crossbar is doubled and when one of them fails communication proceeds at half of the total bandwidth. The aggregate bandwidth is impressive: 737 GB/s for the M9000/64.

Fujitsu/Siemens positions the Mx000 servers for the commercial market and seems not interested to market it for HPC-related work although the specifications look quite good. On the other hand, the systems are fitted with extreme RAS features that will be much appreciated in commercial environments but which makes the systems relatively costly.

Measured Performances:
No performance results in the technical/scientific area are known to us to date. This is not only due to the newness of the system but also to the lack of interest in the scientific HPC realm.