lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitry" <>
Subject Detection of index dublicates in Lucene
Date Sun, 29 Jul 2007 06:18:48 GMT
We trying to find are any implementation for Lucene  -  detection index 
Assuming we have a set of documents and a document is a bunch of words. 
After we created indexec for the same document we need to knwo that all 
ideces will be uniq for specific document. (lexical equivalency).

Can we have like implementation of algorithm  has not determined a duplicate 
and another situation when algorithm has offered a false duplicate. In this 
case we can find all dublicate indeces.

And the same Algorithm we can use to detect Document dublicates - in this 
case we save time and can get better performance not to run indexed services 
against this document.

Please any suggestions will be good.



Search Engine News

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message