lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Li" <>
Subject Re: [jira] Resolved: (LUCENE-709) [PATCH] Enable application-level management of IndexWriter.ramDirectory size
Date Wed, 22 Nov 2006 18:33:06 GMT
I was away so I'm catching up.

If this (occasional large documents consume too much memory) happens
to a few applications, should it be solved in IndexWriter?

A possible design could be:
First, in addDocument(), compute the byte size of a ram segment after
the ram segment is created. In the synchronized block, when the newly
created segment is added to ramSegmentInfos, also add its byte size to
the total byte size of ram segments.
Then, in maybeFlushRamSegments(), either one of two conditions can
trigger a flush: number of ram segments reaching maxBufferedDocs, and
total byte size of ram segments exceeding a threshold.

The overhead is very small in this design. Of course, IndexWriter
would have another configurable parameter. :-) But it's nice if an
application could set a limit on the memory it uses to buffer docs.


On 11/21/06, Yonik Seeley (JIRA) <> wrote:
>     [ ]
> Yonik Seeley resolved LUCENE-709.
> ---------------------------------
>    Resolution: Fixed
> Committed.  Thanks for bearing with me though this Chuck!
> > [PATCH] Enable application-level management of IndexWriter.ramDirectory size
> > ----------------------------------------------------------------------------
> >
> >                 Key: LUCENE-709
> >                 URL:
> >             Project: Lucene - Java
> >          Issue Type: Improvement
> >          Components: Index
> >    Affects Versions: 2.0.1
> >         Environment: All
> >            Reporter: Chuck Williams
> >         Attachments: ramdir.patch, ramdir.patch, ramDirSizeManagement.patch, ramDirSizeManagement.patch,
ramDirSizeManagement.patch, ramDirSizeManagement.patch
> >
> >
> > IndexWriter currently only supports bounding of in the in-memory index cache using
maxBufferedDocs, which limits it to a fixed number of documents.  When document sizes vary
substantially, especially when documents cannot be truncated, this leads either to inefficiencies
from a too-small value or OutOfMemoryErrors from a too large value.
> > This simple patch exposes IndexWriter.flushRamSegments(), and provides access to
size information about IndexWriter.ramDirectory so that an application can manage this based
on total number of bytes consumed by the in-memory cache, thereby allow a larger number of
smaller documents or a smaller number of larger documents.  This can lead to much better performance
while elimianting the possibility of OutOfMemoryErrors.
> > The actual job of managing to a size constraint, or any other constraint, is left
up the applicatation.
> > The addition of synchronized to flushRamSegments() is only for safety of an external
call.  It has no significant effect on internal calls since they all come from a sychronized
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
> -
> For more information on JIRA, see:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message