lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Williams (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-709) [PATCH] Enable application-level management of IndexWriter.ramDirectory size
Date Fri, 17 Nov 2006 21:11:38 GMT
     [ http://issues.apache.org/jira/browse/LUCENE-709?page=all ]

Chuck Williams updated LUCENE-709:
----------------------------------

    Attachment: ramDirSizeManagement.patch

I've just attached my version of this patch.  It includes a multi-threaded test case.  I believe
it is sound.

A few notes:

  1.  Re. Yonik's comment about my synchronization scenario.  Synhronizing as described does
resolve the issue.  No higher level synchronization is requried.  It doesn't matter how concurent
operations on the directory are ordered or intereleaved, so long as any computation that does
a loop sees some instance of the directory that corresponds to its actual content at any polnt
in time.  The result of the loop will then be accurate for that instant.

2.  Lucene has this same syncrhonization bug today in RAMDIrectory.list().  It can return
a list of files that never comprised the contents of the directory.  This is fixed in the
attached.

3.  Also, the long synchronization bug exists in RAMDirectory.fileModified() as well as RAMDIrectory.fileLength()
since both are public.  These are fixed in the attached.

4.  I moved the synchronization off of the Hashtable (replacing it with a HashMap) up to the
RAMDirectory as there are some operations that require synchronization at the directory level.
 Using just one lock seems better.  As all Hashtable operations were already synchonized,
I don't believe any material additional synchronization is added.

5.  Lucene currently make the assumption that if a file is being written by a stream then
no other streams are simultaneously reading or writing it.  I've maintained this assumption
as an optimization, allowing the streams to access fields directly without syncrhonization.
 This is documented in the comments, as is the locking order.

5.  sizeInBytes is now maintained incrementally, efficiently.

6.  Yonik, your version (which I just now saw) has a bug in RAMDIrectory.renameFile().  The
to file may already exist, in which case it is overwritten and it's size must be subtracted.
 I actually hit this in my test case for my implementation and fixed it (since Lucene renames
a new version of the segments file).

All Lucene tests, including the new test, pass.  Some contrib tests fail, I believe none of
these failures are in any way related to this patch.




> [PATCH] Enable application-level management of IndexWriter.ramDirectory size
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-709
>                 URL: http://issues.apache.org/jira/browse/LUCENE-709
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.0.1
>         Environment: All
>            Reporter: Chuck Williams
>         Attachments: ramdir.patch, ramdir.patch, ramDirSizeManagement.patch, ramDirSizeManagement.patch,
ramDirSizeManagement.patch
>
>
> IndexWriter currently only supports bounding of in the in-memory index cache using maxBufferedDocs,
which limits it to a fixed number of documents.  When document sizes vary substantially, especially
when documents cannot be truncated, this leads either to inefficiencies from a too-small value
or OutOfMemoryErrors from a too large value.
> This simple patch exposes IndexWriter.flushRamSegments(), and provides access to size
information about IndexWriter.ramDirectory so that an application can manage this based on
total number of bytes consumed by the in-memory cache, thereby allow a larger number of smaller
documents or a smaller number of larger documents.  This can lead to much better performance
while elimianting the possibility of OutOfMemoryErrors.
> The actual job of managing to a size constraint, or any other constraint, is left up
the applicatation.
> The addition of synchronized to flushRamSegments() is only for safety of an external
call.  It has no significant effect on internal calls since they all come from a sychronized
caller.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message