lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sol myr <solmy...@yahoo.com>
Subject performance question - number of documents
Date Sun, 23 Oct 2011 13:06:02 GMT
Hi,

We've noticed some Lucene performance phenomenon, and would appreciate an explanation from
anyone familiar with Lucene internals

(I know Lucene as a user, but haven't looked under its hood).

We have a Lucene index of about 30 million records.
We ran 2 queries: "AND" and "OR" ("+john +doe" versus "john doe").
The AND query had much better performance (AND takes about 500 millis, while OR takes about
2000 millis).

We wondered whether this has anything to do with the number of potential matches?
Our AND has only about 5000 matches (5000 documents contain *both* "john" and "doe").
Our OR has about 8 million matches (8 million documents contain *either* "john" or "doe").


Does this explain the performance difference?
But why would it matter, as long as we take only the top 5 matches ( indexSearcher.search(query,
5))...?
Is there any other explanation?

Thanks :)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message