lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Petite Abeille <>
Subject Re: Bet you didn't know Lucene can...
Date Mon, 31 Oct 2011 20:42:28 GMT

On Oct 31, 2011, at 9:32 PM, Andrzej Bialecki wrote:

> similarity-preserving hash function was calculated on each sentence, and the hash was
added as a field. The property of the hash was that similar documents (sentences) would produce
a similar hash, with only some bit-level perturbation. The challenge was to find a ranked
list of possible duplicates with similar (not exact same) hashes, which in this case meant
to find a ranked list of documents that have the smallest bit-level distance in their hashes
from the query hash.
> The solution is described in SOLR-1918 - Bit-wise scoring field type.

In other words, a simhash, no?

Similarity Estimation Techniques from Rounding Algorithms

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message