          How to run the EuroBen distributed-memory benchmark.
          ====================================================

Below you find instructions for installation and running the
distributed-memory version of the EuroBen benchmark. It contains
a subset of the single-CPU benchmark and some additions that do not
make sense in a single-CPU environment.
In the case that you run into trouble please mail to:

             Aad van der Steen; steen@hpcresearch.nl 
             ---------------------------------------

The benchmark has the following structure:

                 |- Makefile
                 |
                 |- install/
                 |- basics/
                 |
                 |- mod1h/
                 |- mod1i/
                 |- mod1j/
		 |- mod1k/
eurobenV2.1-dm/ -|
                 |- mod2a/
                 |- mod2as/
                 |- mod2am/
                 |- mod2b/
                 |- mod2ci/
		 |- mod2cr/		 
                 |- mod2f/
                 |- mod2g/
	         |- mod2h/
                 |- mod2i/

The Makefile in euroben-dm/ can be used for easy installation of the
14 programs in eurobenV2.1-dm/:

mod1h  -- A test for the bandwidth of some communication patterns.
mod1i  -- A test for the speed of a distributed dotproduct.
mod1j  -- Very precise point-to-point bandwidth measurement.
mod1k  -- Tests one-sided communication via MPI_Put/Get. So,
          TO RUN THIS PROGRAM MPI-2 IS REQUIRED!!!
----------
mod2a  -- A test for the speed of a dense matrix-vector multiplication c = Ab.
mod2as -- A test for sparse matrix-vector multiplication (CRS format) c = Ab..
mod2am -- A test for the speed of dense matrix-matrix multiplication C = AB.
mod2b  -- A test for the speed of solving a dense linear system Ax = b.
mod2ci -- A test for the speed of solving a sparse linear non-symmetric
          system Ax = b of Finite Element type. Matrix is in CRS format.
mod2cr -- A test for the speed of solving a sparse linear symmetric
          system Ax = b stemming from a 3-D finite difference problem.
mod2f  -- A test for the speed of a 1-D Fast Fourier Transform.
mod2g  -- A test for the speed of a 2-D Haar Wavelet Transform.
mod2h  -- A test for the speed of a random number generator.
mod2i  -- A test for the speed of sorting Integers and 64-bit Reals.

We assume that, at least, for the first time, you will want to run
the programs with the same compiler options. 

1) cd basics/
   1a - Modify the subroutine 'state.f' such that it reflects the state
        of the system: type of machine, compiler version, compiler
        options, OS version, etc.
         
2) Go back to eurobenV2.1-dm/
        Do a 'make state':
   2a - The 'state.f' routine that you have modified is copied to all
        the program directories.
   2b - The 'numerics.f' file containing the module that defines the
        range and precision for floating-point constants and variables
        is copied to all the program directories.

3) cd install/
   3a - In install/ you will find a header file with definitions for
        the 'make' utility.
        Modify the 'Make.Incl' such that is contains the correct 
        name for the Fortran 90 compiler, Loader (usually the same as
        the compiler), and the options for the Fortran 90 compiler.
        There are default definitions to define the MPI include (INCS)
        and library paths (LIBS). Define these as appropriate for your
        system.
        
4) Go back to eurobenV2.1-dm/
   4a - Do a 'make make': This will cause the Makefiles in all program
        directories to be completed according to the specifications you
        made in 'install/Make.Incl'.

5) Do a 'make makeall': In all the program directories the programs will
   be compiled and the executables will be made. This will take a few
   minutes. The executable names are x.<prog> for any program <prog>.
   REMARK: When your MPI implementation is not full MPI-2 probably
           no executable x.mod1k will be produced because of missing
	   references to MPI_Win_Create, MPI_Put, and MPI_Get.

6) Now the programs are ready to be run: You can run all programs
   automatically by setting the environment variable NPROCS to the
   desired number of processors, for example:
   export NPROCS=5 (for Bourne/bash/Korn shell-like shells), or
   setenv NPROCS 5 (for csh/tcsh-like shells).
   All programs will then be run on 5 processors 
   ================
   | EXCEPT  FOR: |
   ================
   6a - Program mod1h: This program does bandwidth measurements
        that needs exactly 4, 6, or 8  processes/ors. By default it
	will be run on 6 processors.
   6b - Program mod1j: This program does a point-to-point bandwidth
        measurements and needs exactly 2  processes/ors. It will be
        run on 2 processors irrespecitive of the value of NPROCS.
   6c - Program mod1k: This program does one-sided point-to-point
        bandwidth measurements and needs exactly 2  processes/ors.	
        It will be run on 2 processors irrespective of the value
        of NPROCS.
   6d - Program mod2f: This program must run on a number of
        processors that is a power of 2: p = 2^m. It will be run on
        2^n processors such that n is the maximum value for which
        2^n <= NPROCS.
   6e - Program mod2g: This program must run on a number of
        processors that is a power of 2: p = 2^m. It will be run on
        2^n processors such that n is the maximum value for which
        2^n <= NPROCS.
   The results are collected in a directory Log.<hostname>.

   Alternatively: Go to each of the individual directories and run
   the programs with the amount of processes you like (subject to the
   restrictions mentioned above).

========================================================================
                         Timing considerations
========================================================================

Your system might be so fast that the timing resolution of the timing
routine (MPI_Wclock) is not sufficient for one or some of the programs.
This is indicated by an execution time in the result file that is <=
1.0e-9 seconds or even negative, and a highly unlikely speed of 
> 10e+10 M(fl)op/s. In this case do the following for any of the
programs <prog>:

7) cd <prog>/
   7a - You find in the directory '<prog>/' an input file '<prog>.in'.
        This file describes the problem sizes and the repeat count
        for the problems to be run. The last column is the repeat
        count. Increase the repeat counts to values you think
        appropriate.

========================================================================
                   Customising the runs: (OPTIONAL)
========================================================================
You might want to run some of the programs with other than the
standard compiler options as specified in the procedure above.
In that case do the following for any of the programs <prog>:

8) cd <prog>/
   8a - Modify the definition of 'FFLAGS' in the Makefile.
   8b - Modify the compiler options line in subroutine 'state.f'.
   8c - Do a 'make veryclean': this will remove all old objects and
        the executable.
   8d - Do a 'make'.

9) Run the resulting program x.<prog> as before.

10) If you want to use other problem sizes than initially defined in the
    <prog> directories you will find a file '<prog>.in' in each directory. In
    the '<prog>.in' files are the size parameters and often repeat counts for
    making the timing more reliable. Change the '<prog>.in' file(s) according
    to your wishes and rerun the program(s) as described under 6).



                         =====================
                         | Best of success!! |
                         =====================

