Februari 15, 2011
-----------------
The programs in the MPI version for which it was relevant have been run
with up to 64 cores. The naming of the log-files is somewhat complex but is
explained here:

<prog>.p<AB>-<XY>

has been run on <AB> cores and <XY> cores/node.
So, e.g., mod2a.p32-04.log gives the result of a run with 32 cores (processes)
with 4 cores/node, i.e., from 8 nodes 4 cores were used.

Little problems were encountered except for programs
- mod1k: (using MPI_Put/MPI_GET) that did not run without any message.
- mod2i: for nprocs >= 28 a segmentation fault was reported. Extending the
         possible culprit, the messaging buffers for exchanging the partially
	 sorted arrays did not help. So, we do not know what the problem is
	 (yet).
	 
A general observation is that using a small amount of cores on many nodes gives
a higher performance than using many cores on a smaller number of nodes. This
suggests that either the internal bandwidth in the node is a bottelneck or that
the MPI implementation is not well optimised for message passing within a node.
