lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LUCENE-3866) Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext methods return unmodifiable List<R extends IndexReader>
Date Sun, 17 Jun 2012 23:41:42 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393639#comment-13393639
] 

Uwe Schindler edited comment on LUCENE-3866 at 6/17/12 11:40 PM:
-----------------------------------------------------------------

Here the first step of this refactoring (affects only few core classes):

- getSequentialSubReaders() returns a List<? extends IndexReader> (unmodifiable)
- Nuked ReaderUtil.Gather completely (this one is ineffective and slow because of useless
recursion). We can use ReaderContext.leaves() to get all atomic sub readers including their
docStarts
- Improved MultiFields and MultiDocValues by usage of ReaderContext.leaves(). Code much easier
to read!
- Moved ReaderUtil.Slice to separate class (it's somehow unrelated), but has nothing to do
with ReaderUtil anymore -> Should move to index package + hidden

There is still one (slow) nocommit, because step 2 will refactor the ReaderContexts to use
Collections, too.

Yonik:

bq. I think we should allow users to access in the most low-level efficient manner, just as
lucene can. Remember, this is expert level stuff!  Seems like at most we would need a javadoc
comment saying "don't modify this".

Actually it's more efficient than before. And the collection views are still backed by arrays
and are never used in inner loops (it's just when IndexSearcher initializes or executes searches
on all segments). Without any real benchmark showing a slowness (please do this *after* step
2 -> nocommit) I don't think we should risk corrumpt readers.
                
      was (Author: thetaphi):
    Here the first step of this refactoring (affects only few core classes):

- getSequentialSubReaders() returns a List<? extends IndexReader> (unmodifiable)
- Nuked ReaderUtil.Gather completely (this one is ineffective and slow because of useless
recursion). We can use ReaderContext.leaves() to get all atomic sub readers including their
docStarts
- Moved ReaderUtil.Slice to separate class (it's somehow unrelated), but has nothing to do
with ReaderUtil anymore -> Should move to index package + hidden

There is still one (slow) nocommit, because step 2 will refactor the ReaderContexts to use
Collections, too.

Yonik:

bq. I think we should allow users to access in the most low-level efficient manner, just as
lucene can. Remember, this is expert level stuff!  Seems like at most we would need a javadoc
comment saying "don't modify this".

Actually it's more efficient than before. And the collection views are still backed by arrays
and are never used in inner loops (it's just when IndexSearcher initializes or executes searches
on all segments). Without any real benchmark showing a slowness (please do this *after* step
2 -> nocommit) I don't think we should risk corrumpt readers.
                  
> Make CompositeReader.getSequentialSubReaders() and the corresponding IndexReaderContext
methods return unmodifiable List<R extends IndexReader>
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3866
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3866
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-3866-step1.patch
>
>
> Since Lucene 2.9 we have the method getSequentialSubReader() returning IndexReader[].
Based on hardly-to-debug errors in user's code, Robert and me realized that returning an array
from a public API is an anti-pattern. If the array is intended to be modifiable (like byte[]
in terms,...), it is fine to use arrays in public APIs, but not, if the array must be protected
from modification. As IndexReaders are 100% unmodifiable in trunk code (no deletions,...),
the only possibility to corrumpt the reader is by modifying the array returned by getSequentialSubReaders().
We should prevent this.
> The same theoretically applies to FieldCache, too - but the party that is afraid of performance
problems is too big to fight against that :(
> For getSequentialSubReaders there is no performance problem at all. The binary search
of reader-ids inside BaseCompositeReader can still be done with the internal protected array,
but public APIs should expose only a unmodifiable List. The same applies to leaves() and children()
in IndexReaderContext. This change to list would also allow to make CompositeReader and CompositeReaderContext
Iterable<R extends IndexReader>, so some loops would look nice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message