lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan McKinley <ryan...@gmail.com>
Subject Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids
Date Mon, 20 Apr 2009 23:14:12 GMT
thanks!

everything got better when I removed my logic to cache based on the  
index modification time.


On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:

> On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley <ryantxu@gmail.com>  
> wrote:
>> This issue started on java-user, but I am moving it to solr-dev:
>> http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception
>>
>> I am using solr trunk and building an RTree from stored document  
>> fields.
>>  This process worked fine until a recent change in 2.9 that has  
>> different
>> document id strategy then I was used to.
>>
>> In that thread, Yonik suggested:
>> - pop back to the top level from the sub-reader, if you really need  
>> a single
>> set
>> - if a set-per-reader will work, then cache per segment (better for
>> incremental updates anyway)
>>
>> I'm not quite sure what you mean by a "set-per-reader".
>
> I meant RTree per reader (per segment reader).
>
>>  Previously I was
>> building a single RTree and using it until the the last modified  
>> time had
>> changed.  This avoided building an index anytime a new reader was  
>> opened and
>> the index had not changed.
>
> I *think* that our use of re-open will return the same IndexReader
> instance if nothing has changed... so you shouldn't have to try and do
> that yourself.
>
>>  I'm fine building a new RTree for each reader if
>> that is required.
>
> If that works just as well, it will put you in a better position for
> faster incremental updates... new RTrees will be built only for those
> segments that have changed.
>
>> Is there any existing code that deals with this situation?
>
> To cache an RTree per reader, you could use the same logic as
> FieldCache uses... a weak map with the reader as the key.
>
> If a single top-level RTree that covers the entire index works better
> for you, then you can cache the RTree based on the top level multi
> reader and translate the ids... that was my fix for ExternalFileField.
> See FileFloatSource.getValues() for the implementation.
>
>
>> - - - -
>>
>> Yonik also suggested:
>>
>>  Relatively new in 2.9, you can pass null to enumerate over all non- 
>> deleted
>> docs:
>>  TermDocs td = reader.termDocs(null);
>>
>>  It would probably be a lot faster to iterate over indexed values  
>> though.
>>
>> If I iterate of indexed values (from the FieldCache i presume) then  
>> how do i
>> get access to the document id?
>
> IndexReader.terms(Term t) returns a TermEnum that can iterate over
> terms, starting at t.
> IndexReader.termDocs(Term t or TermEnum te) will give you the list of
> documents that match a term.
>
>
> -Yonik


Mime
View raw message