The C-DAC PARAM Padma

Introduction
HPC Architecture
  1. Shared-memory SIMD machines
  2. Distributed-memory SIMD machines
  3. Shared-memory MIMD machines
  4. Distributed-memory MIMD machines
  5. ccNUMA machines
  6. Clusters
  7. Processors
    1. AMD Opteron
    2. IBM POWER6
    3. IBM PowerPC 970
    4. IBM BlueGene processors
    5. Intel Itanium 2
    6. Intel Xeon
    7. The MIPS processor
    8. The SPARC processors
  8. Accelerators
    1. GPU accelerators
    2. General accelerators
    3. FPGA accelerators
  9. Networks
    1. Infiniband
    2. InfiniPath
    3. Myrinet
    4. QsNet
Available systems
  1. The Bull NovaScale
  2. The C-DAC PARAM Padma
  3. The Cray XT3
  4. The Cray XT4
  5. The Cray XT5h
  6. The Cray XMT
  7. The Fujitsu/Siemens M9000
  8. The Fujitsu/Siemens PRIMEQUEST
  9. The Hitachi BladeSymphony
  10. The Hitachi SR11000
  11. The HP Integrity Superdome
  12. The IBM BlueGene/L&P
  13. The IBM eServer p575
  14. The IBM System Cluster 1350
  15. The Liquid Computing LiquidIQ
  16. The NEC Express5800/1000
  17. The NEC SX-9
  18. The SGI Altix 4000
  19. The SiCortex SC series
  20. The Sun M9000
Systems disappeared from the list
Systems under development
Glossary
Acknowledgments
References
Machine type RISC-based distributed memory multi-processor.
Models C-DAC PARAM Padma.
Operating system AIX (IBM's Unix flavour), Linux
Connection structure Clos network.
Compilers Fortran 77/90, C, C++
Vendors information Web page http://www.cdac.in/html/parampma.asp
Year of introduction 2003.

 

System parameters:

Model C-DAC PARAM Padma
Clock cycle 1 GHz
Theor. peak performance  
Per Proc. (Gflop/s) 4
Maximal (Gflop/s) 992
Memory 500 GB
No. of processors 248
Comm. bandwidth  
Aggregate 4 GB/s
Point-to-point 312 MB/s
Full duplex 235 MB/s

Remarks:

The PARAM Padma is the newest systems made by the Indian C-DAC. It is built somewhat asymmetrically from 54 4-processor SMPs and 1 32-processor node. All nodes employ 1 GHz IBM POWER4 processors. As an interconnection network C-DACs own PARAMnet-II is used for which a peak bandwidth of 2.5 Gb/s (312 MB/s) is given with a latency for short messages of ≅ 10 µs. The network is build from 16-port PARAMnet-II switches and has a Clos64 topology, very similar to the structure used by Myrinet. No MPI results over this network are available.

C-DAC has already a long tradition of building parallel machines and it has always provided its own software to go with them. Therefore, the Padma comes with Fortran 90, C(++), MPI, and a Parallel Files System.

Measured Performances:
The Padma performs at 532 Gflop/s with the HPC Linpack Benchmark (see [54]) for a linear system of size N = 224,000 on a 62-node machine with a theoretical peak of 992 Gflop/s. That amounts to an efficiency of 53.6% for this benchmark.