|Workload type||Lucas-Lehmer test|
gpuLucas parallelizes the IBDWT-method for fast multiplies modulo Mersenne numbers.
gpuLucas was developed under Windows 7 using Visual C++ in Visual Studio 2008 as a research code, but has since been ported to Linux.
At the time of its announcement in 2010, it could perform nearly two times faster than CUDALucas due to using non-power-of-2 FFT lengths. PDF given here, Implementation details were published in 2011 and the source code was released in 2012 under BSD license (even though supporting libraries use GPL).  
The code was further developed by Aaron Haviland, who improved I/O, added autodetection of optimal FFT size, saving checkpoints, etc.