lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen" <jason.rutherg...@gmail.com>
Subject Re: SegmentReader with custom setting of deletedDocs, single reusable FieldsReader
Date Sun, 29 Jun 2008 15:07:29 GMT
bq: Each part of the index (e.g. tis, frq) is actually only covered by a
single file descriptor

The code seems to indicate otherwise.  When a query comes in, a cloned
SegmentTermEnum is used with it's own file descriptor.  After the query is
completed, SegmentTermEnum is closed along with the file descriptor.  With
FieldsReader currently a document call is made, along with potentially many
other document calls from other threads, and only one may pass using only
one file descriptor.

public SegmentTermEnum terms() {
  return (SegmentTermEnum)origEnum.clone();
}

protected Object clone() {
    SegmentTermEnum clone = null;
    clone.input = (IndexInput) input.clone(); // new file descriptor
    return clone;
}

bq: using pread may be the answer

Yes.  What about the alternative of increasing the buffer size.  That is
something where threadlocal could be used to reuse byte buffers as creating
new large buffers would be expensive.

On Sun, Jun 29, 2008 at 10:47 AM, Yonik Seeley <yonik@apache.org> wrote:

> On Sun, Jun 29, 2008 at 9:42 AM, Jason Rutherglen
> <jason.rutherglen@gmail.com> wrote:
> > IndexReader.document as it is is really a lame duck.  The
> > IndexReader.document call being synchronized at the top level drags down
> the
> > performance of systems that store data in Lucene.  A single file
> descriptor
> > for all threads on an index that is constantly returning results with
> fields
> > is a serious problem.  Users are always complaining about this issue and
> now
> > I know why.
>
> Each part of the index (e.g. tis, frq) is actually only covered by a
> single file descriptor by default - stored fields aren't unique in
> that regard.
>
> It's probably the case that the stored fields of a given document are
> much less likely to be in OS cache though... and in that case having
> multiple requests in-flight to the disk could improve things.
>
> On anything except Windows, using pread may be the answer (after the
> other synchronization is also removed of course):
> https://issues.apache.org/jira/browse/LUCENE-753
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
View raw message