lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "emerson cargnin" <echofloripa.y...@gmail.com>
Subject Re: Restricting the number of docs per search field
Date Tue, 28 Feb 2006 15:51:03 GMT
Yes, the bottleneck is defenitely in lucene. The index is quite big, three
files with more than 1 giga.
We are querying for html extracts, with the id together, but it can return
almost 50 extracts for each ID, and just the first 2 will be used. We could
as well do 10 queries(that's the max number of ids that will be searched),
but Im not sure what would be faster...

thanks for the answer


On 28/02/06, Grant Ingersoll <gsingers@syr.edu> wrote:
>
> Do you want the first 2 docs, regardless of score, with the same
> property or do you want the 2 highest scoring docs with the same property?
>
> You might look at the HitCollector search method on IndexSearcher.  Btw,
> the Filter that is required can be null.  The HitCollector interface
> allows you to apply business rules to a document as it is being scored
> and store them how you see fit.
>
> I am not sure you will see much performance gain, as Lucene is pretty
> fast. Have you done profiling, etc. to determine that Lucene is actually
> the bottleneck?
>
> -Grant
>
>
>
> emerson cargnin wrote:
> > does anyone knows a solution for that?
> > I know theres a method that returns a TopDoc, but it needs a filter, and
> in
> > my case, Ill need the first 2 of each doc with the same value in a given
> > property.
> >
> >
> > On 27/02/06, emerson cargnin <echofloripa.yell@gmail.com> wrote:
> >
> >> Hi all
> >>
> >> Due a performance problem, I'm looking a way of restricting the docs
> >> returned based of the number of docs which a field has the same value.
> At
> >> the moment we just discard the docs if more than a X number for the
> same
> >> field, but I think it can be done by lucence, hence improving a lot the
> >> performance, as now it can return a hundred of docs for the the field
> value,
> >> when just the first 4-5 will be really used.
> >>
> >> thanks
> >> Emerson
> >>
> >>
> >
> >
>
> --
> -------------------------------------------------------------------
> Grant Ingersoll
> Sr. Software Engineer
> Center for Natural Language Processing
> Syracuse University
> School of Information Studies
> 335 Hinds Hall
> Syracuse, NY 13244
>
> http://www.cnlp.org
> Voice:  315-443-5484
> Fax: 315-443-6886
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message