lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Williams (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-709) [PATCH] Enable application-level management of IndexWriter.ramDirectory size
Date Thu, 16 Nov 2006 07:30:38 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-709?page=comments#action_12450301 ] 
            
Chuck Williams commented on LUCENE-709:
---------------------------------------

I hadn' t considered the case of such large values for maxBufferedDocs, and agree that the
loop execution time is non-trivial in such cases.  Incremental management of the size seems
most important, especially considering that this will also eliminate the cost of the synchronization.

I still think the syncrhonization adds safety since it guarantees that the loop sees a state
of the directory that did exist at some time.  At that time, the directory did have the reported
size.  Without the synchronization the loop may compute a size for a set of files that never
comprised the contents of the directory at any instant.  Consider this case:

  1.  Thread 1 adds a new document, creating a new segment with new index files, leading to
segment merging, that creates new larger segment index files, and then deletes all replaced
segment index files.  Thread 1 then adds a second document, creating new segment index files.
  2.  Thread 2 is computing sizeInBytes and happens to see a state where all the new files
from both the first and second documents are added, but the deletions are not seen.  This
could happen if the deleted files happen to be earlier in the hash array than the added files
for either document.

In this case sizeInBytes() without the synchronization computes a larger size for the directory
than ever actually existed.

Re. RAMDIrectory.fileLength(), it is not used within Lucene at all, but it is public, and
the restriction that is not valid when index operations are happening concurrently is not
specified.  I think that is a bug.

I'll rethink the patch based on your observations, Yonik, and resubmit.  Thanks.


> [PATCH] Enable application-level management of IndexWriter.ramDirectory size
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-709
>                 URL: http://issues.apache.org/jira/browse/LUCENE-709
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.0.1
>         Environment: All
>            Reporter: Chuck Williams
>         Attachments: ramDirSizeManagement.patch, ramDirSizeManagement.patch
>
>
> IndexWriter currently only supports bounding of in the in-memory index cache using maxBufferedDocs,
which limits it to a fixed number of documents.  When document sizes vary substantially, especially
when documents cannot be truncated, this leads either to inefficiencies from a too-small value
or OutOfMemoryErrors from a too large value.
> This simple patch exposes IndexWriter.flushRamSegments(), and provides access to size
information about IndexWriter.ramDirectory so that an application can manage this based on
total number of bytes consumed by the in-memory cache, thereby allow a larger number of smaller
documents or a smaller number of larger documents.  This can lead to much better performance
while elimianting the possibility of OutOfMemoryErrors.
> The actual job of managing to a size constraint, or any other constraint, is left up
the applicatation.
> The addition of synchronized to flushRamSegments() is only for safety of an external
call.  It has no significant effect on internal calls since they all come from a sychronized
caller.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message