lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes
Date Mon, 26 Mar 2012 14:00:31 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238402#comment-13238402
] 

Michael McCandless commented on LUCENE-3659:
--------------------------------------------

This looks great Uwe!

I'm a little worried about the tiny file case; you're checking for
SEGMENTS_* now, but many other files can be much smaller than 1/64th
of the estimated segment size.

I wonder if we should "improve" IOContext to hold the [rough]
estimated file size (not just overall segment size)... the thing is
that's sort of a hassle on codec impls.

Or: maybe, on closing the ROS/RAMFile, we can downsize the final
buffer (yes, this means copying the bytes, but that cost is vanishingly
small as the RAMDir grows).  Then tiny files stay tiny, though they
are still [relatively] costly to create...

I don't this RAMDir.createOutput should publish the RAMFile until the
ROS is closed?  Ie, you are not allowed to openInput on something
still opened with createOutput in any Lucene Dir impl..?  This would
allow us to make RAMFile frozen (eg if ROS holds its own buffers and
then creates RAMFile on close), that requires no sync when reading?

I also don't think RAMFile should be public, ie, the only way to make
changes to a file stored in a RAMDir is via RAMOutputStream.  We can
do this separately...

Maybe we should pursue a growing buffer size...?  Ie, where each newly
added buffer is bigger than the one before (like ArrayUtil.oversize's
growth function)... I realize that adds complexity
(RAMInputStream.seek is more fun), but this would let tiny files use
tiny RAM and huge files use few buffers.  Ie, RAMDir would scale up
and scale down well.

Separately: I noticed we still have IndexOutput.setLength, but, nobody
calls it anymore I think?  (In 3.x we call this when creating a CFS).
Maybe we should remove it...

                
> Improve Javadocs of RAMDirectory to document its limitations and add improvements to
make it more GC friendly on large indexes
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3659
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3659
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.5, 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch
>
>
> Spinoff from several dev@lao issues:
> - [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
> - issue LUCENE-3653
> The use cases for RAMDirectory are very limited and to prevent users from using it for
e.g. loading a 50 Gigabyte index from a file on disk, we should improve the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message