mod2f:
------
The implementation has been changed by executing the 1-D FFT as a series of
2-D FFTs like the way it is done in a distributed memory implementation. The
scalability of this new implementation is much better.

mod2h:
------
The output unit is now Mega-random numbers/s instead of Mop/s because it better
reflects the performance for this particular kernel.
