University of North Carolina at Chapel Hill
SiftGPU is an implementation of SIFT [1] for GPU. It does pyramid
construction, keypoint detection and descriptor
generation on GPU. Not only does SiftGPU process
pixels/features paralelly with GPU, this implementation also
reduces
readback time by generating compact feature list with GPU
algorithms [3].
SiftGPU borrows a lot from Andrea Vedaldi's sift++ [2]. Many parameters of sift++ ( for example, number of octaves,
number
of DOG levels, edge threshold, etc) are also available in SiftGPU. Shader programs are dynamically generated
according to those parameters.
Hardware requirement
The
entire functionality works fully only on hardware that supports cg profile
fp40/vp40 or higher (gp4fp/gp4vp
available in cg 2.0), for example,
nvidia 7900, 8800. If your GPU does not support fp40/vp40 or gp4fp/gp4vp,
orientation computation of SIFT will be simplified,
and edge elimination and descriptor will be
ignored.
There are also some GPU parameters to play with. You can try tune them to get best performance.
Both
VC6 workspace and VS2005 solution are provided: VC\SiftGPU.dsw and
VC\SIftGPU.sln. Make sure you
set necessary arguments when running the binaries in VC. Please use Release
build in VC to get better performance.
Linux makefile is now also provied .
You can download SiftGPU packge(including code, binary and some test images) here (8.6MB)
You may also be interested in some documents: README, History, Manual, and old slides .
Experiments
Below is the comparision with Lowe's SIFT on box.pgm using the comparision code from Vedaldi's SIFT

Below is the timing experiment result on V293. (One experiment image is resized to different size with Xnview for this).

Note!
0, The
parameter you should use to aligh this implementation to Lowe's is {"-m",
"-s", "-fo","-1", "-loweo"}
1, SiftGPU may return slightly different results on
different GPUs due to different floating point precision
2, Loading images
and descriptor normalization are using CPU. Performance may vary on different
CPUs.
Using binary pgm files can save image
loading time.
3, SiftGPU may be slow if your graphic memory is not big
enough, because virtual memory could be
adopted by graphic card automatically.
4, It
takes time for SiftGPU to allocate/reallocate texture memories. It is
efficient to process a set of
images of a same size (or smaller) because memroy does
need to change.
[1] D. G. Lowe. Distinctive image features from
scale-invariant keypoints. International Journal of Computer Vision, 60,
November 2004. http://www.cs.ubc.ca/~lowe/keypoints/
.
[2] A. Vedaldi. sift++,
http://vision.ucla.edu/~vedaldi/code/siftpp/siftpp.html
.
[3] G. Ziegler, A. Tevs, C.
Theobalt, and H.-P. Seidel. GPU point list generation through histogram
pyramids. In Technical Report, June 2006.