lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <>
Subject Re: lucene suggest
Date Wed, 22 Aug 2007 12:32:22 GMT

21 aug 2007 kl. 13.10 skrev Jens Grivolla:

> On 8/21/07, Heba Farouk <> wrote:
>> the documents are not duplicated, i mean the hits (assume that 2  
>> documents have the same subject but with different authors, so if  
>> i'm searching the subject, the returned hits will have duplicates )
>> i was asking if i can remove duplicates from the hits??
> You may not want to work with documents at all (where you have the
> duplicates), but rather with the terms in your index directly.  Take a
> look at WildcardTermEnum etc.

My favorite solution for this is a stand alone trie, and such a  
solution is available in LUCENE-625.

Another way is to create an ngram-index.

It is usually a good idea to create an "a priori" corpus with a  
limited set of data. I prefere common user queries rather than items  
in the index. Especially if your corpus is large and you have a lot  
of server load.

Try LUCENE-550 as a priori index. My guess is that it would  
outperform a RAMDirectory 20x at 25,000 title-sized (40 chars avg)  


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message