lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: TermScorer default buffer size
Date Thu, 08 Jan 2009 09:27:09 GMT
John, 

Continuing, see below.

On Wednesday 07 January 2009 14:24:15 Paul Elschot wrote:
> On Wednesday 07 January 2009 07:25:17 John Wang wrote:
> > Hi:
> > 
> >    The default buffer size (for docid,score etc) is 32 in TermScorer.
> > 
> >     We have a large index with some terms to have very dense doc sets. By
> > increasing the buffer size we see very dramatic performance improvements.
> > 
> >     With our index (may not be typical), here are some numbers with buffer
> > size w.r.t. performance in our query (a large OR query):
> > 
> >     Buffer-size  improvement
> > 2042 -       22.0 %
> > 4084 -       39.1 %
> > 8172 -       51.1 %
> > 
> >     I understand this may not be suitable for every application, so do you
> > think it makes sense to make this buffer size configurable?
> > 
> 
> Ideally the TermScorer buffer size could be set to a size depending on
> the query structure, but there is no facility for this yet.
> For OR queries larger buffers help, but not for AND queries.
> See also LUCENE-430 on reducing buffer sizes for the underlying
> TermDocs for very sparse doc sets.

It may be possible to change the TermScorer buffer size dynamically.
For OR queries TermScorer.next() is used, and for AND queries
TermScorer.skipTo() is used.
That means that when the buffer runs out during TermScorer.next(),
it could be enlarged, for example by doubling (or quadrupling) the size
to a configurable maximum of 8K or even 16K, see above. When
TermScorer.skipTo() runs out of the buffer it could leave the buffer
size unchanged.

This involves some memory allocation during search.
That is unusual, but it could be worthwhile given the
performance improvement.

Regards,
Paul Elschot

Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message