lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen" <jason.rutherg...@gmail.com>
Subject Re: SegmentReader with custom setting of deletedDocs, single reusable FieldsReader
Date Sun, 29 Jun 2008 13:42:05 GMT
I've been looking more at how to improve the IndexReader.document call.
There are a few options.  I implemented the IndexReader.documents call which
has the down side of not being backward compatible.  Probably the only way
to achieve both ends is the threadlocal as I noticed term vectors does the
same thing.  This raises the issue of too many file descriptors for term
vectors if there are many reopens, does it not?  It would seem that copying
the reference to termVectorsLocal on reopens would help with this.  If this
is amenable then the same could be done for fieldsReader with a
fieldsReaderThreadLocal.

IndexReader.document as it is is really a lame duck.  The
IndexReader.document call being synchronized at the top level drags down the
performance of systems that store data in Lucene.  A single file descriptor
for all threads on an index that is constantly returning results with fields
is a serious problem.  Users are always complaining about this issue and now
I know why.

This should be a separate issue from IndexReader.clone.

On Sun, Jun 29, 2008 at 5:41 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

>
> Jason Rutherglen wrote:
>
>  A possible solution to FieldsReader is to have an IndexReader.documents()
>> method that returns a Documents class.  The Documents class maintains an
>> underlying FieldsReader and file descriptor that can be closed like TermEnum
>> or TermDocs etc.  Of course it would have a document(int n, FieldSelector
>> selector) method.  The issue is what the default behavior would be for
>> IndexReader.document for the SegmentReader.clone/reopen(boolean force).  I'm
>> not sure how efficient it would be to open and close a FieldsReader per
>> IndexReader.document call.
>>
>
> I don't think we want to open/close FieldsReader per document() call?
>
> I think we should test simply synchronizing FieldsReader, first.  I think
> that's the simplest solution.  Modern JVMs have apparently gotten better
> about synchronized calls, especially when there is little contention.  In
> the typical usage of Lucene there would be no contention.  If the
> performance cost is negligible then it makes SegmentReader.doReopen very
> simple -- no external locking or special subclassing is necessary.
>
>  I was using InstantiatedIndex and performing a commit when changes came
>> in, but realized that during the commit incoming searches could see wrong
>> results.  Now, like the InstantiatedIndex javadocs suggest, each one is
>> immutable.
>>
>> The IndexReader over the RAM buffer sounds good.  As an interim solution
>> it would be beneficial to have the SegmentReader.clone/reopen(boolean force)
>> so that the first version of Ocean can be completed and I can move on to
>> other projects like Tag Index.
>>
>
> I agree we should still implement a SegmentReader.clone.  Jason can you
> update the patch?  (change from boolean force to clone(); synchronization of
> FieldsReader; undoing the mixed up import line shuffling).
>
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
View raw message