lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: BufferedIndexInput.readByte performance
Date Fri, 26 May 2006 16:26:37 GMT
On Friday 26 May 2006 16:14, Michael Chan wrote:
> Hi,
> I have a 5gb index containing 2mil documents and am trying to run
> 1mil+ queries against it. Most of the queries are SpanQueries and it
> occurs to me that the search performance is quite slow when using 2, 3
> SpanOrQueries nested inside a SpanNearQuery, which in turn is nested
> inside another SpanNearQuery. The response time is around 3-5 seconds
> even when the index is stored as a RAMDirectory, and
> BufferedIndexInput.readByte() appears to be the bottleneck. Is this
> performance typical? As I don't need any sorting of the results and

That is indeed typical.

> only need the number of results returned, is there anything, besides
> Field.setOmitNorm(true), I can modify to improve performance?

A few things might help:
- use getSpans() on the scorer of the query, iterate the resulting Spans
  and count the number of different doc values.
  This saves the scoring and the sorting on score value.
- Sort the queries alphabetically, to try and maximize cache usage.
- Increase the skip interval when creating the index, by default lucene uses
  16, but nutch uses a higher value. I've never done this myself, but
  you could specifically ask on how to do this.
- In case you use ordered SpanNearQuery, the implementation here might be
  somewhat faster:
- Specialize the PriorityQueue used in SpanNearQuery, somewhat more intricate
  than this one for disjunctions:

Paul Elschot

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message