lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Created: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
Date Thu, 22 Mar 2007 17:06:34 GMT
improve how IndexWriter uses RAM to buffer added documents

                 Key: LUCENE-843
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Index
    Affects Versions: 2.2
            Reporter: Michael McCandless
         Assigned To: Michael McCandless
            Priority: Minor

I'm working on a new class (MultiDocumentWriter) that writes more than
one document directly into a single Lucene segment, more efficiently
than the current approach.

This only affects the creation of an initial segment from added
documents.  I haven't changed anything after that, eg how segments are

The basic ideas are:

  * Write stored fields and term vectors directly to disk (don't
    use up RAM for these).

  * Gather posting lists & term infos in RAM, but periodically do
    in-RAM merges.  Once RAM is full, flush buffers to disk (and
    merge them later when it's time to make a real segment).

  * Recycle objects/buffers to reduce time/stress in GC.

  * Other various optimizations.

Some of these changes are similar to how KinoSearch builds a segment.
But, I haven't made any changes to Lucene's file format nor added
requirements for a global fields schema.

So far the only externally visible change is a new method
"setRAMBufferSize" in IndexWriter (and setMaxBufferedDocs is
deprecated) so that it flushes according to RAM usage and not a fixed
number documents added.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message