lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-709) [PATCH] Enable application-level management of IndexWriter.ramDirectory size
Date Fri, 10 Nov 2006 14:48:38 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-709?page=comments#action_12448758 ] 
            
Yonik Seeley commented on LUCENE-709:
-------------------------------------

> That code too was without the thread-safety measure Yonik suggests so I don't know what
overhead that will add.

switching to an enumerator should be negligibly faster since Hashtable's iterator is implemented
as it's enumerator plus  extra concurrent modification checks.  That might not be sufficient
for total thread safety though.

enumerating through the Hashtable while not synchronized means you can encounter an object
that was just added by another thread.  The other thread synchronized while adding the new
object, but the thread enumerating didn't execute a read barrier.  The new memory model provides
"out-of-thin-air safety" and "initialization safety" guarantees.  Thus, we are guaranteed
to see a complete instance of RAMFile (just not necessarily current).  In this specific usecase,
I think it boils down to if updating the long length is atomic, which we can't guarantee for
all platforms.  Your count could be off by 4GB if you "see" the bottom 32 bits before the
top.

In this IndexWriter usecase, we should never see a long length that uses both 32 bit words,
because we are talking about single segments though.

Bottom line (I think):  If you want getSizeBytes to work correctly 100% of the time in *all*
instances and platforms, you need to synchronize it (and hence block any gets/puts during
that time.... blech)



> [PATCH] Enable application-level management of IndexWriter.ramDirectory size
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-709
>                 URL: http://issues.apache.org/jira/browse/LUCENE-709
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.0.1
>         Environment: All
>            Reporter: Chuck Williams
>         Attachments: ramDirSizeManagement.patch
>
>
> IndexWriter currently only supports bounding of in the in-memory index cache using maxBufferedDocs,
which limits it to a fixed number of documents.  When document sizes vary substantially, especially
when documents cannot be truncated, this leads either to inefficiencies from a too-small value
or OutOfMemoryErrors from a too large value.
> This simple patch exposes IndexWriter.flushRamSegments(), and provides access to size
information about IndexWriter.ramDirectory so that an application can manage this based on
total number of bytes consumed by the in-memory cache, thereby allow a larger number of smaller
documents or a smaller number of larger documents.  This can lead to much better performance
while elimianting the possibility of OutOfMemoryErrors.
> The actual job of managing to a size constraint, or any other constraint, is left up
the applicatation.
> The addition of synchronized to flushRamSegments() is only for safety of an external
call.  It has no significant effect on internal calls since they all come from a sychronized
caller.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message