lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robichaud, Jean-Philippe" <Jean-Philippe.Robich...@scansoft.com>
Subject RE: Need a way to set a result limit on a particular field
Date Wed, 15 Jun 2005 18:12:38 GMT

It may be simpler and more effective to use the Hits object and keep the
number of time each host was actually "returned" to the user and skip it if
the limit has been reach.  This way, if your users just look at the 10-20
highest hits, you will save you a lot of processing time, especially if your
index is huge...

Here is some pseudo code stripped from a class I once wrote


Hits hits = iSearcher.search(myQuery);
IntHash hostFreqCount = new IntHash();

int i=0;
int j=0;

while(i < hist.length) {
 j=0;
 for(; (i<hits.length && j < 10); i++,j++) {

  Document doc = iSearcher.doc(hits.doc(i));
  String host_id = doc.get("host_id");
  hostFreqCount.inc(host_id);

   if(hostFreqCount.get(host_id) > 3) continue;

  ///  show the hit to the use...  

 }
}

 
Hope it helped !

Jp


-----Original Message-----
From: Jay Hill [mailto:jayallenhill@gmail.com] 
Sent: Wednesday, June 15, 2005 2:01 PM
To: java-user@lucene.apache.org
Subject: Re: Need a way to set a result limit on a particular field

Thanks Tony and Erik for the replies. The trick is we don't know the
hosts that will be returned in advance, we just don't want more than 3
from any one host. It's not unlike searching on Google where you might
see a link that says "More results from foo.com". We essentially want
to discard any results > 3 for any one host. In some of our searches
we might get high scores on 20 or 30 documents, but we don't want to
show page after page from the same host, we'd rather limit it to 3
from each for more diversity.

I may have to use a brute force approach using HitCollector as Tony
suggests. I was hoping to avoid the HitCollector, but there may be no
other way right now.

Many thanks,
-Jay


On 6/14/05, Erik Hatcher <erik@ehatchersolutions.com> wrote:
> 
> On Jun 14, 2005, at 7:23 PM, Jay Hill wrote:
> > I have a need to limit my Hits returned based on one of the indexed
> > fields. This is a web application and we want to limit the number of
> > hits from any one host. We have a field named "host_id" and I'd like
> > to be able to limit my results to no more than three results for any
> > one host_id.
> 
> I may not be fully understanding your question, but I'll go with my
> assumptions... wrap the users query into a BooleanQuery as a required
> clause and then add another clause with a TermQuery for the specific
> host_id.  Then simply constrain the number of Hits shown to the first
> 3.  Does that do what you're after?
> 
>      Erik
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message