MPI benchmark on a quad-core Intel Clovertown system using the
SSE3 units. Standard compiler options:
"-fast -ip -tune pn4 -arch pn4 -axP -tpp7 -openmp".
Remarks:
-        : Intel MPI 3.0  does not work: the loader can find the library but
           it cannot find the PMPI_*** wrapper routines in the library, so
	   no executables are produced.
	   Instead, ANL MPICH2, version 1.0.5 with SMP support is used. This
	   gives excellently scalable results for all programs, except 'mod1k',
	   see below.
- mod1k  : Run of mod1k (latency and bandwidth of MPI_Get/Put) is problematic
           for small messages: The program never terminates. For larger messages
	   MPI_Get gives no results when the program header is printed by
	   processor 0 but GIVES (wrong) results when the program header is
	   printed by processor 1. The values for the messages to be received by
	   processor 1 with MPI_Get turn out never to be initialised. As the
	   program runs correctly on other platforms wechave to conclude we 
	   have a compiler/MPI error here with no obvious solution.

