MPI benchmark on Cray X1E (1.13 GHz vector processor in MSP mode)
Remarks:
Note that the results are not very consistent in the sense that for
programs 'mod2a',...,'mod2h' not the default runs on 1, 4, and 16 processors
are done, but rather on the number of processors that were available at that
time, i.e., 10--12 at most.

- mod2a  : On more than 1 processor we get incorrect execution which is
           detected by the internal check routine.
	   On 4 processors we get consistent errors in the first 25
	   elements of the result vector.
	   On 10 processors we get such errors in the first 10 elements
	   of the result in the row-wise implementations (straight and
	   4 times unrolled) and in the first 20 elements of the
	   column-wise implementation. This is always for the transposed
	   cases. The normal case is always correct.
- mod2as : On more than 1 processor we get an error in the main program
           mod2as.f:
	   
	   	    Fault: floating point divide by zero
          (the address of the instruction at fault is not precise)

           Traceback for process 352474(msp mode, ssp 0) apid 352423.1 on node 0
                   mod2as_+0x0D20 (0x8001004100) at mod2as.f
           Fault: floating point divide by zero
                 (the address of the instruction at fault is not precise)
           Floating exception
	   
	   So, there is only a 1 processor result in mod2as.p01.log
- mod2f  : The MPI_Alltoallv routine causes problems when attempting to
           run on 1 processor:
	   
	   Traceback for process 354529(msp mode, ssp 0) apid 354529.0 on node 0
           pmpi_alltoallv_+0x1460 (0x1098DA0) at alltoallv.c
                   gtrans_+0x0258 (0x100EBF8) at gtrans.f:31
            Fault: Attempt to dereference null pointer: 0x0
            Segmentation fault
	    
	    On more than 1 processor the addresses to send to/receive from
	    seem to be incorrect:
	    Traceback for process 354846(msp mode, ssp 0) apid 354846.0 on
	    node 0
                  cntdpls_+0x0168 (0x100C388) at cntdpls.f
                   gtrans_+0x0208 (0x100EBA8) at gtrans.f:27
            Fault: unable to access memory address: 0x3841000000
            Segmentation fault
	    
	    So, there are no results for this program.
- mod2g   : When execution on more than 1 processor we get problems in 
            the data generation routine gendat.f:
	    
	    Traceback for process 355108(msp mode, ssp 0) apid 355065.2 on
	    node 0
                   gendat_+0x00C0 (0x100010062A0) at gendat.f
                    mod2g_+0x0F7C (0x1000100431C) at mod2g.f:65
            Fault: invalid floating point operation
            (the address of the instruction at fault is not precise)
	    
	    The message "invalid floating point operation" is suprising as
	    floating-point operation is done, except an implicit conversion
	    from an Integer value to an 8-byte Real.
	    
	    So, only a 1-processor result is present in mod2g.p01.log
	    	   	    
