lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <karl.wet...@gmail.com>
Subject Re: TermQuery search returns the same Document several times
Date Sat, 07 Feb 2009 14:18:15 GMT

5 feb 2009 kl. 14.44 skrev Lebiram:

> If HitCollector only returns a document once then he might be  
> referring to an application ID that is assigned to a field that has  
> been indexed twice or more with different document IDs.
>
> I'll clarify this with him.
>
> However is there a way to somehow do a group by field on the  
> results? That field being the application ID?


There is no built in feature for your request, I think it needs to be  
handled by post processing of the collected documents. I recently  
implemented that for an application:

(Perhaps it is possible to implement in a better way using a function  
query.)

It collects lots of documents and expose them to the consumer via a  
facade that lazily load documents from the IndexReader as they are  
requested. A Set<MyPrimaryKey> keeps track of if the entity already is  
a member of the results but with a greater score.

This means I must estimate the number of total hits (and how many  
documents to collect in order to collect enough entities as requested  
by the client) with the mean number of documents collected per entity  
in an average query.


       karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message