lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@lucene.com>
Subject Re: Interesting idea
Date Wed, 10 Jul 2002 16:35:47 GMT
Jon Scott Stevens wrote:
> Adding support to Lucene for Nilsimsa seems like a cool idea...
> 
> http://ixazon.dynip.com/~cmeclax/nilsimsa.html
> 
> The index would be the hash and one could use Lucene to rank searches based
> on the Nilsimsa rating of the results...

Nilsimsa employs a very different model than Lucene.  So this would 
require a re-write of the indexing and search portions of Lucene, which 
is most of the code.

Nilsimsa appears to use what is called a "signature file" approach in 
the literature, while Lucene uses an "inverted file".  A search on 
Google for "signature file versus inverted index" turns up a paper by 
Zobel et. al. which concludes:

   Our conclusions are unequivocal. For typical document indexing
   applications, current signature file techniques do not perform well
   compared to current implementations of inverted file indexes.

See: http://www.cs.columbia.edu/~pirot/cs6111/Readings/zobel98.pdf

Doug


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message