lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Hill <jayallenh...@gmail.com>
Subject Re: Need a way to set a result limit on a particular field
Date Wed, 15 Jun 2005 22:07:03 GMT
I like this approach. This may be what I'm looking for.

Thanks JP!
-Jay

On 6/15/05, Robichaud, Jean-Philippe
<Jean-Philippe.Robichaud@scansoft.com> wrote:
> 
> It may be simpler and more effective to use the Hits object and keep the
> number of time each host was actually "returned" to the user and skip it if
> the limit has been reach.  This way, if your users just look at the 10-20
> highest hits, you will save you a lot of processing time, especially if your
> index is huge...
> 
> Here is some pseudo code stripped from a class I once wrote
> 
> 
> Hits hits = iSearcher.search(myQuery);
> IntHash hostFreqCount = new IntHash();
> 
> int i=0;
> int j=0;
> 
> while(i < hist.length) {
>  j=0;
>  for(; (i<hits.length && j < 10); i++,j++) {
> 
>   Document doc = iSearcher.doc(hits.doc(i));
>   String host_id = doc.get("host_id");
>   hostFreqCount.inc(host_id);
> 
>    if(hostFreqCount.get(host_id) > 3) continue;
> 
>   ///  show the hit to the use...
> 
>  }
> }
> 
> 
> Hope it helped !
> 
> Jp
> 
> 
> -----Original Message-----
> From: Jay Hill [mailto:jayallenhill@gmail.com]
> Sent: Wednesday, June 15, 2005 2:01 PM
> To: java-user@lucene.apache.org
> Subject: Re: Need a way to set a result limit on a particular field
> 
> Thanks Tony and Erik for the replies. The trick is we don't know the
> hosts that will be returned in advance, we just don't want more than 3
> from any one host. It's not unlike searching on Google where you might
> see a link that says "More results from foo.com". We essentially want
> to discard any results > 3 for any one host. In some of our searches
> we might get high scores on 20 or 30 documents, but we don't want to
> show page after page from the same host, we'd rather limit it to 3
> from each for more diversity.
> 
> I may have to use a brute force approach using HitCollector as Tony
> suggests. I was hoping to avoid the HitCollector, but there may be no
> other way right now.
> 
> Many thanks,
> -Jay
> 
> 
> On 6/14/05, Erik Hatcher <erik@ehatchersolutions.com> wrote:
> >
> > On Jun 14, 2005, at 7:23 PM, Jay Hill wrote:
> > > I have a need to limit my Hits returned based on one of the indexed
> > > fields. This is a web application and we want to limit the number of
> > > hits from any one host. We have a field named "host_id" and I'd like
> > > to be able to limit my results to no more than three results for any
> > > one host_id.
> >
> > I may not be fully understanding your question, but I'll go with my
> > assumptions... wrap the users query into a BooleanQuery as a required
> > clause and then add another clause with a TermQuery for the specific
> > host_id.  Then simply constrain the number of Hits shown to the first
> > 3.  Does that do what you're after?
> >
> >      Erik
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message