SiftGPU: A GPU Implementation of David Lowe's Scale Invariant Feature Transform (SIFT) 

Changchang Wu

University of North Carolina at Chapel Hill


 

SIFT Implementation

SiftGPU is an implementation of SIFT [1] for GPU. It does pyramid construction, keypoint detection and descriptor
generation on GPU.  Not only does SiftGPU process pixels/features paralelly with GPU, this implementation also
reduces readback time by generating compact feature list with GPU algorithms [3].

SiftGPU borrows a lot from Andrea Vedaldi's sift++ [2]. Many parameters of sift++ ( for example, number of octaves,
number of DOG levels, edge threshold, etc) are also available in SiftGPU. Shader programs are dynamically generated
according to those parameters.

Hardware requirement

The entire functionality works fully only on hardware that supports cg profile fp40/vp40 or higher (gp4fp/gp4vp
available in cg 2.0), for example,  nvidia 7900, 8800. If your GPU does not support fp40/vp40 or gp4fp/gp4vp,
orientation computation of SIFT will be simplified, and edge elimination and descriptor will be ignored.

There are also some GPU parameters to play with. You can try tune them to get best performance.

Environment

Both VC6 workspace and VS2005 solution are provided: VC\SiftGPU.dsw and VC\SIftGPU.sln. Make sure you
set necessary arguments when running the binaries in VC. Please use Release build in VC to get better performance.

Linux makefile is now also provied .

Download

You can download SiftGPU packge(including code, binary and some test images) here  (8.6MB)

You may also be interested in some documents: README, History, Manual, and old slides .

Experiments

Below is the comparision with Lowe's SIFT on box.pgm using the comparision code from Vedaldi's SIFT

Below is the timing experiment result on V293.  (One experiment image is resized to different size with Xnview for this).

Note!

0, The parameter you should use to aligh this implementation to Lowe's is {"-m", "-s", "-fo","-1", "-loweo"}
1, SiftGPU may return slightly different results on different GPUs due to different floating point precision
2, Loading images and descriptor normalization are using CPU. Performance may vary on different CPUs.
    Using binary pgm files can save image loading time.
3, SiftGPU may be slow if your graphic memory is not big enough, because virtual memory could be
  adopted by graphic card automatically.
4, It takes time for SiftGPU to allocate/reallocate texture memories. It is efficient to process a set of
    images of a same size (or smaller) because memroy does need to change.

References

[1]   D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, November 2004. http://www.cs.ubc.ca/~lowe/keypoints/ .
[2]   A. Vedaldi. sift++,   
http://vision.ucla.edu/~vedaldi/code/siftpp/siftpp.html .
[3]   G. Ziegler, A. Tevs, C. Theobalt, and H.-P. Seidel. GPU point list generation through histogram pyramids. In Technical Report, June 2006.