lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <ear...@gmail.com>
Subject Re: [jira] Commented: (LUCENE-2691) Consolidate Near Real Time and Reopen API semantics
Date Thu, 25 Nov 2010 22:41:14 GMT
> Earwin, I used MMAP a lot, is quite nice, it  has its place under the sun,
> but it is not a silver bullet, it has its quirks... the same goes for
> RAMDirectory.
Ok, I pointed you to a directory that can be wrapped over FSDirectory
and loads files into memory buffer, when queried.

> bq. There is zero need for any such signal. ...non-existing file ...
>
> Why would IndexReader ever try to read non-existing file? IR is going to see
> its RAMDirectory point-in-time snapshot of an Index until you somehow try to
> reload updated Index image on disk.
What is your aim?

Have a RAMDir pre-loaded from FSDir, open your reader over this
RAMDir, and when you reopen the reader you want the contents of RAMDir
to be updated. Riiight?

Very well, have your RAMDir delegating listFiles call to backing
FSDir, so reader knows which files do exist and when actually tries to
open a file that exists on disk, but haven't been opened before - your
directory loads it to memory.
If you want, you can also refcount files when they are opened/closed,
and if file is no longer used, you drop it from memory (leaving disk
copy intact).

There's no reason at all for the Reader to know that his Directory
might be a proxy and notify it of anything else besides opening and
closing files. Proxying can be completely transparent.

> On Thu, Nov 25, 2010 at 6:00 PM, Earwin Burrfoot (JIRA) <jira@apache.org>
> wrote:
>>
>>    [
>> https://issues.apache.org/jira/browse/LUCENE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935794#action_12935794
>> ]
>>
>> Earwin Burrfoot commented on LUCENE-2691:
>> -----------------------------------------
>>
>> {quote}
>> bq. You're still okay with an API that allows you to reopen IRs on
>> different directories?
>> Well, that's no good - we can catch this and throw an exc?
>> {quote}
>> I don't understand why should we bother with checking and throwing
>> exceptions, when we can prevent such things from compiling at all.
>> By using an API, that doesn't support reopening on anything different from
>> original source.
>>
>> bq. Really, there are two separate "things" open/reopen needs:
>> That's not true. Take a look at my WriterBackedReader above (or
>> DirectoryReader in trunk). It requires writer at least to call
>> deleteUnusedFiles(), nrtIsCurrent().
>> So you can't easily reopen between Directory-backed and Writer-backed
>> readers without much switching and checking.
>>
>> bq. r_ram.reload(); //Here we want to reload from the FSDirecotory?
>> Use MMapDirectory? It's only a bit slower for searches, while not raping
>> your GC on big indexes.
>> Also check this out - https://gist.github.com/715617 , it is a
>> RAMDirectory offspring that wraps any other given directory and basically
>> does what you want (if I guessed right).
>> It doesn't use blocking for files, so file size limit is 2Gb, but this can
>> be easily fixed. On the up side - it reads file into memory only after the
>> size is known (unlike RAMDir), which allows you to use huge precisely-sized
>> blocks, lessening GC pressure.
>> I used it for a long time, but then my indexes grew, heaps followed, VM
>> exploded and I switched to MMapDirectory (with minor patches).
>>
>> bq. What is missing is a "signal" from IR.reload() to RAMdirectory to
>> slurp fresh information from FSDirecory?
>> There is zero need for any such signal. If a reader requests non-existing
>> file from RAMDirectory, it should check backing dir before throwing
>> exception. If backing dir does have the file - it is loaded and opened.
>> Why do you people love complicating things that much? :)
>>
>> > Consolidate Near Real Time and Reopen API semantics
>> > ---------------------------------------------------
>> >
>> >                 Key: LUCENE-2691
>> >                 URL: https://issues.apache.org/jira/browse/LUCENE-2691
>> >             Project: Lucene - Java
>> >          Issue Type: Improvement
>> >            Reporter: Grant Ingersoll
>> >            Assignee: Grant Ingersoll
>> >            Priority: Minor
>> >             Fix For: 4.0
>> >
>> >         Attachments: LUCENE-2691.patch, LUCENE-2691.patch
>> >
>> >
>> > We should consolidate the IndexWriter.getReader and the
>> > IndexReader.reopen semantics, since most people are already using the
>> > IR.reopen() method, we should simply add::
>> > {code}
>> > IR.reopen(IndexWriter)
>> > {code}
>> > Initially, it could just call the IW.getReader(), but it probably should
>> > switch to just using package private methods for sharing the internals
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Phone: +7 (495) 683-567-4
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message