lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: performance question - number of documents
Date Sun, 23 Oct 2011 17:18:03 GMT
"Why would it matter...top 5 matches" Because Lucene has to calculate
the score of all documents in order to insure that it returns those 5 documents.
What if the very last document scored was the most relevant?

Best
Erick

On Sun, Oct 23, 2011 at 3:06 PM, sol myr <solmyr72@yahoo.com> wrote:
> Hi,
>
> We've noticed some Lucene performance phenomenon, and would appreciate an explanation
from anyone familiar with Lucene internals
>
> (I know Lucene as a user, but haven't looked under its hood).
>
> We have a Lucene index of about 30 million records.
> We ran 2 queries: "AND" and "OR" ("+john +doe" versus "john doe").
> The AND query had much better performance (AND takes about 500 millis, while OR takes
about 2000 millis).
>
> We wondered whether this has anything to do with the number of potential matches?
> Our AND has only about 5000 matches (5000 documents contain *both* "john" and "doe").
> Our OR has about 8 million matches (8 million documents contain *either* "john" or "doe").
>
>
> Does this explain the performance difference?
> But why would it matter, as long as we take only the top 5 matches ( indexSearcher.search(query,
5))...?
> Is there any other explanation?
>
> Thanks :)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message