lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Realtime Search for Social Networks Collaboration
Date Tue, 09 Sep 2008 16:41:32 GMT

Yonik Seeley wrote:

> On Tue, Sep 9, 2008 at 5:28 AM, Michael McCandless
> <> wrote:
>> Yonik Seeley wrote:
>>> What about something like term freq?  Would it need to count the
>>> number of docs after the local maxDoc or is there a better way?
>> Good question...
>> I think we'd have to take a full copy of the term -> termFreq on  
>> reopen?  I
>> don't see how else to do it (I don't understand your suggestion  
>> above).  So,
>> this will clearly add to the cost of reopen.
> One could adjust the freq by iterating over the terms documents...
> skipTo(localMaxDoc) and count how many are after that, then subtract
> from the freq.  I didn't say it was a *good* idea :-)

Ahh, OK :)

>>>> For reading stored fields and term vectors, which are now flushed
>>>> immediately to disk, we need to somehow get an IndexInput from the
>>>> IndexOutputs that IndexWriter holds open on these files.  Or,  
>>>> maybe, just
>>>> open new IndexInputs?
>>> Hmmm, seems like a case of our nice and simple Directory model not
>>> having quite enough features in this case.
>> I think we can simply open IndexInputs on these files.  I believe  
>> Java does
>> the right thing on windows, such that if we are already writing to  
>> the file,
>> it does not prevent another file handle from opening the file for  
>> reading.
> Yeah, I think the underlying RandomAccessFile might do the right
> thing, but IndexInput isn't required to see any changes on the fly
> (and current implementations don't) so at a minimum it would be a
> change of IndexInput semantics.  Maybe there would need to be a
> refresh() function added, or we would need to require a specific
> Directory impl?
> OR, if all writes are append-only, perhaps we don't ever need to
> invalidate the read buffer and would just need to remove the current
> logic that caches the file length and then let the underlying
> RandomAccessFile do the EOF checking.

All writes to these files are append only, and, when we open the  
IndexInput we would never read beyond it's current length (once we  
flush our IndexOutput) because that's the local maxDocID limit.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message