lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids
Date Mon, 20 Apr 2009 20:51:31 GMT
On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley <ryantxu@gmail.com> wrote:
> This issue started on java-user, but I am moving it to solr-dev:
> http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception
>
> I am using solr trunk and building an RTree from stored document fields.
>  This process worked fine until a recent change in 2.9 that has different
> document id strategy then I was used to.
>
> In that thread, Yonik suggested:
> - pop back to the top level from the sub-reader, if you really need a single
> set
> - if a set-per-reader will work, then cache per segment (better for
> incremental updates anyway)
>
> I'm not quite sure what you mean by a "set-per-reader".

I meant RTree per reader (per segment reader).

>  Previously I was
> building a single RTree and using it until the the last modified time had
> changed.  This avoided building an index anytime a new reader was opened and
> the index had not changed.

I *think* that our use of re-open will return the same IndexReader
instance if nothing has changed... so you shouldn't have to try and do
that yourself.

> I'm fine building a new RTree for each reader if
> that is required.

If that works just as well, it will put you in a better position for
faster incremental updates... new RTrees will be built only for those
segments that have changed.

> Is there any existing code that deals with this situation?

To cache an RTree per reader, you could use the same logic as
FieldCache uses... a weak map with the reader as the key.

If a single top-level RTree that covers the entire index works better
for you, then you can cache the RTree based on the top level multi
reader and translate the ids... that was my fix for ExternalFileField.
 See FileFloatSource.getValues() for the implementation.


> - - - -
>
> Yonik also suggested:
>
>  Relatively new in 2.9, you can pass null to enumerate over all non-deleted
> docs:
>  TermDocs td = reader.termDocs(null);
>
>  It would probably be a lot faster to iterate over indexed values though.
>
> If I iterate of indexed values (from the FieldCache i presume) then how do i
> get access to the document id?

IndexReader.terms(Term t) returns a TermEnum that can iterate over
terms, starting at t.
IndexReader.termDocs(Term t or TermEnum te) will give you the list of
documents that match a term.


-Yonik

Mime
View raw message