lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4484) NRTCachingDir can't handle large files
Date Tue, 16 Oct 2012 11:03:03 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476908#comment-13476908
] 

Michael McCandless commented on LUCENE-4484:
--------------------------------------------

bq. Can uncache() be changed to return the still-open newly created IndexOutput?

I think we'd have to wrap the RAMOutputStream .. then we could 1) know when too many bytes
have been written, 2) close the wrapped RAMOutputStream and call uncache to move it to disk,
3) fix uncache to not close the IO (return it), 4) cutover the wrapper to the new on-disk
IO.  And all of this would have to be done inside a writeByte/s call (from the caller's standpoint)
... it seems hairy.

We could also just leave it be, ie advertise this limitation.  NRTCachingDir is already hairy
enough...  The purpose of this directory is to be used in an NRT setting where you have relatively
frequent reopens compared to the indexing rate, and this naturally keeps files plenty small.
 It's also particularly unusual to index only stored fields in an NRT setting (what this test
is doing).

Yet another option would be to somehow have the indexer be able to flush based on size of
stored fields / term vectors files ... today of course we completely disregard these from
the RAM accounting since we write their bytes directly to disk.  Maybe ... the app could pass
the indexer an AtomicInt/Long recording "bytes held elsewhere in RAM", and indexer would add
that in its logic for when to trigger a flush...
                
> NRTCachingDir can't handle large files
> --------------------------------------
>
>                 Key: LUCENE-4484
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4484
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Michael McCandless
>
> I dug into this OOME, which easily repros for me on rev 1398268:
> {noformat}
> ant test  -Dtestcase=Test4GBStoredFields -Dtests.method=test -Dtests.seed=2D89DD229CD304F5
-Dtests.multiplier=3 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt
-Dtests.locale=ru -Dtests.timezone=Asia/Vladivostok -Dtests.file.encoding=UTF-8 -Dtests.verbose=true
> {noformat}
> The problem is the test got NRTCachingDir ... which cannot handle large files because
it decides up front (when createOutput is called) whether the file will be in RAMDir vs wrapped
dir ... so if that file turns out to be immense (which this test does since stored fields
files can grow arbitrarily huge w/o any flush happening) then it takes unbounded RAM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message