|
Until two years ago SRC was the only company that sold a full stand-alone FPGA
accelerated system, named the SRC-7. Now it has to share this space with Convey
and Kuberre. Besides that the so-called SRC-7 MAP station is sold, the MAP being
the processing unit that contains 2 Altera Stratix II FPGAs. Furthermore, SRC
has the IMAP card as a product that can be plugged in a PCIe slot of any PC.
SRC has gone to great length to ban the term FPGA from its documentation.
Instead it talks about implicit vs. explicit computing. In SRC terms implicit
computing is performed on standard CPUs while explicit computing is done on its
(reconfigurable) MAP processor. The SRC-7 systems have been designed with the
integration of both types of processors in mind and in this sense it is a hybrid
architecture also because shared extended memory can be put into the system that
is equally accessible by both the CPUs and the MAP processors.
We show a sketch of the machine structure in Figure 29
Figure 29: Approximate machine structure of the SRC-7.
It shows that CPUs and MAP processors are connected by a 16×16 so-called
Hi-Bar crossbar switch with a link speed of 7.2 GB/s. The maximum aggregate
bandwidth in the switch 115.2 GB/s, enough to route all 16 independent data
streams. The CPUs must be of the x86 or x86_64 type. So, both Intel and AMD
processors are possible. As can be seen in the Figure the connection to the CPUs
is made through SRCs proprietary SNAP interface. This accommodates the 7.2 GB/s
bandwidth but isolates it from the vendor-specific connection to memory. Instead
of configuring a MAP processor, also common extended memory can be configured.
This allows for shared-memory parallelism in the system across CPUs and MAP
processors.
The MAP station is a shrunk version of the SRC-7: it contains a x86(_64) CPU, a
MAP processor, and a 4×4 Hi-Bar crossbar that allows Common Extended
memory to be configured.
SRC and Convey are the only accelerator vendors that support Fortran. SRC does
this through its development environment Carte. Like with Convey and Kuberre,
also C/C++ is available. The parallelisation and acceleration are largely done
by putting comment directives in Fortran code and pragmas in C/C++ code. Also,
explicit memory management and prefetching can be done in this way. The
directives/pragmas cause a bitstream to by loaded onto the FPGAs in one or more
MAP processors that configures them and executes the target code. Furthermore,
there is an extensive library of functions, a debugger and a performance
analyzer. When one wants to employ specific non-standard funtionality, e.g.,
computing with arithmetic of non-standard length, one can create a so-called
Application Specific Funtional Unit. In fact, one then configures one or more of
the FPGAs directly and one has to fall back on VHDL or Verilog for this
configuration.
|