lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-709) [PATCH] Enable application-level management of IndexWriter.ramDirectory size
Date Wed, 15 Nov 2006 15:58:40 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-709?page=comments#action_12450079 ] 
            
Yonik Seeley commented on LUCENE-709:
-------------------------------------

Thinking a little further on this:
Synchronizing on the Hashtable here does not solve the whole problem, it only slows things
down.  The problem isn't the Hashtable (using an Enumerator rather than an Iterator would
solve the fail-fast concurrent modification thing).

The problem is unsynchronized access to RAMFile.length
RAMFile and IndexInput/IndexOutput aren't meant to be MT-safe.
The correct solution would be to synchronize that (have a RAMFile.getLength(), and a RAMFile.setLength())

The question is... is it worth it?  Probably...
I don't think the cost should be too bad since RAMInputStream makes a local copy of the length,
and RAMOutputStream inherits from BufferedOutputStream and only updates the length every buffer
flush.

> [PATCH] Enable application-level management of IndexWriter.ramDirectory size
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-709
>                 URL: http://issues.apache.org/jira/browse/LUCENE-709
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.0.1
>         Environment: All
>            Reporter: Chuck Williams
>         Attachments: ramDirSizeManagement.patch, ramDirSizeManagement.patch
>
>
> IndexWriter currently only supports bounding of in the in-memory index cache using maxBufferedDocs,
which limits it to a fixed number of documents.  When document sizes vary substantially, especially
when documents cannot be truncated, this leads either to inefficiencies from a too-small value
or OutOfMemoryErrors from a too large value.
> This simple patch exposes IndexWriter.flushRamSegments(), and provides access to size
information about IndexWriter.ramDirectory so that an application can manage this based on
total number of bytes consumed by the in-memory cache, thereby allow a larger number of smaller
documents or a smaller number of larger documents.  This can lead to much better performance
while elimianting the possibility of OutOfMemoryErrors.
> The actual job of managing to a size constraint, or any other constraint, is left up
the applicatation.
> The addition of synchronized to flushRamSegments() is only for safety of an external
call.  It has no significant effect on internal calls since they all come from a sychronized
caller.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message