Obtaining High Performance via Lower-Precision FPGA Floating Point Units
 
Junqing Sun1 (jsun5@utk.edu)
Advisors: Gregory Peterson1, Olaf Storaasli2
1University of Tennessee, Knoxville, 2Oak Ridge National Laboratory
     First Place Award, ACM Student Research Competition at SC07    
Abstract
  Because of their intrinsic parallelism, flexibility, and pipeline ability, FPGAs show great potential for accelerating computational intensive applications. Experimental results and vendor specifications reveal that lower-precision floating point components on FPGAs cost fewer resources, require lower memory bandwidth, and can achieve higher frequency compared to higher-precision components. This research addresses high performance linear equation solvers employing lower-precision floating point arithmetic. The high accuracy of final solutions is achieved by a few higher-precision iterative refinements using lower-precision intermediate results.
   We implement a mixed precision hybrid direct solver on the Cray-XD1 supercomputer at Oak Ridge National Laboratory. Our direct solver maps most of the tasks to FPGAs for fast lower-precision computations, and uses host processors to refine the final solutions for higher accuracy. Test results on Cray-XD1 supercomputer show that our mixed-precision algorithm and design achieve the same accuracy as if the complete algorithm is computed in higher-precision, while achieving a significant speedup over a 2.2 GHz Opteron processor.
Poster