Fast Exact Search in Hamming Space with Multi-Index Hashing
Version 1.0 - Updated on Aug 21, 2012.
This is a C++/matlab implementation of the Multi-Index Hashing (MIH)
algorithm [1] for fast exact nearest neighbor search in Hamming distance
on binary codes. By using this code, one can reproduce most of the
results explained in the paper on 1 billion and 80 million
datasets. This implementation improves the storage efficiency of our
previous implementation described in the paper by utilizing sparse
hash tables.
You can download the latest version of our code
from github.com/norouzi/mih.
Reference
[1] An extended version of the conference paper,
arxiv.org/abs/1307.2982
[2] Fast Search in Hamming Space with Multi-Index Hashing,
Mohammad Norouzi, Ali Punjani, David Fleet,
IEEE Computer Vision and Pattern Recognition (CVPR), 2012. [pdf]
Abstract: There has been growing interest in mapping image data onto
compact binary codes for fast near neighbor search in vision
applications. Although binary codes are motivated by their use as
direct indices (addresses) into a hash table, codes longer than 32
bits are not being used in this way, as it was thought to be
ineffective. We introduce a rigorous way to build multiple hash tables
on binary code substrings that enables exact K-nearest neighbor search
in Hamming space. The algorithm is straightforward to implement,
storage efficient, and it has sub-linear run-time behavior for
uniformly distributed codes. Empirical results show dramatic speed-ups
over a linear scan baseline and for datasets with up to one billion
items, 64- or 128-bit codes, and search radii up to 25 bits.