lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-709) [PATCH] Enable application-level management of IndexWriter.ramDirectory size
Date Thu, 16 Nov 2006 05:00:38 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-709?page=comments#action_12450269 ] 
            
Yonik Seeley commented on LUCENE-709:
-------------------------------------

>  the contents of the hash table may change during the sizeInBytes() iteration.

Yes, but that's OK.

> Files might be deleted and/or added to the directory concurrently, causing the size to
be computed from an invalid intermediate state

Synchronizing at that low level doesn't make the computed size more valid though... you need
synchronization at a higher level if you want to say more about what the size you are computing
represents.

Consider the case of two different uncoordinated threads... one adding a new file to the RAMDirectory,
and the other calculating the size of the directory().  In the unsynchronized case, you don't
know if the size will include the new file or not.   If sizeInBytes() is synchronized, you
still don't know which thread will acquire the lock first, so you still don't know if the
size will include the new file.  Synchronizing sizeInBytes() does nothing but add a bottleneck.

> Synchronizing on files avoids the problem altogether without much cost as the loop is
fast. 

I disagree that the loop will be fast... simpler loops have proven to take some time:
  LUCENE-388: Improve indexing performance when maxBufferedDocs is
  large by keeping a count of buffered documents rather than
  counting after each document addition.
That was just counting the documents, not the number of files in each segment (which will
be larger).
Consider maxBufferedDocs of 1000 to 10000 with 10 or 20 indexed fields, and you end up with
17000 to 270000 files to calculate the size over.


> [PATCH] Enable application-level management of IndexWriter.ramDirectory size
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-709
>                 URL: http://issues.apache.org/jira/browse/LUCENE-709
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.0.1
>         Environment: All
>            Reporter: Chuck Williams
>         Attachments: ramDirSizeManagement.patch, ramDirSizeManagement.patch
>
>
> IndexWriter currently only supports bounding of in the in-memory index cache using maxBufferedDocs,
which limits it to a fixed number of documents.  When document sizes vary substantially, especially
when documents cannot be truncated, this leads either to inefficiencies from a too-small value
or OutOfMemoryErrors from a too large value.
> This simple patch exposes IndexWriter.flushRamSegments(), and provides access to size
information about IndexWriter.ramDirectory so that an application can manage this based on
total number of bytes consumed by the in-memory cache, thereby allow a larger number of smaller
documents or a smaller number of larger documents.  This can lead to much better performance
while elimianting the possibility of OutOfMemoryErrors.
> The actual job of managing to a size constraint, or any other constraint, is left up
the applicatation.
> The addition of synchronized to flushRamSegments() is only for safety of an external
call.  It has no significant effect on internal calls since they all come from a sychronized
caller.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message