lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Lucene search performance: linear?
Date Tue, 05 Dec 2006 10:33:57 GMT
Zhang, Lisheng wrote:
> Hi,
> 
> I indexed first 220,000, all with a special keyword, I did a simple
> query and only fetched 5 docs, with Hits.length()=220,000.
> 
> Then I indexed 440,000 docs, with the same keyword, query it
> again and fetched a few docs, with Hits.length(0=440,000.
> 
> I found that search time is about linear: 2nd time is about 2 times
> longer than 1st query. I would like to understand:
> 
> Does the linear relation come from score calculation, since we
> have to calculate score one by one? Or other reason?

Yes, Lucene is in fact linear in the # documents that match, for a
single term query.

Lucene is an inverted text index, which means (to first order) each
term lists the document IDs that contain it.

For a single term query, Lucene must walk the full list of documents
containing that term, scoring each one.  For multi-term queries,
Lucene walks each term's list of documents "in synchrony" such that
each document ID is visited at once across all terms.  The time spent
gets trickier in this case because while the final result set could be
small, the amount of IO & documents that had to be visited & checked
in order to compute that result set, could have been large.

> If we have B-tree index I would naively expect a better scalibility?

I don't think Lucene uses B-trees now, and I'm not sure how we could
make use of B-trees such that searching is faster than linear time?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message