lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids
Date Fri, 24 Apr 2009 14:21:48 GMT
I think Shalin upgraded the jars this morning, so I'd just grab them 
again real quick.

4/4 4:46 am : Upgraded to Lucene 2.9-dev r768228

Ryan McKinley wrote:
> thanks Mark!
>
> how far is lucene /trunk from what is currently in solr?
>
> Is it something we should consider upgrading?
>
>
> On Apr 24, 2009, at 8:30 AM, Mark Miller wrote:
>
>> I just committed a fix Ryan - should work with upgraded Lucene jars.
>>
>> - Mark
>>
>> Ryan McKinley wrote:
>>> thanks!
>>>
>>>
>>> On Apr 23, 2009, at 6:32 PM, Mark Miller wrote:
>>>
>>>> Looks like its my fault. Auto resolution was moved upto 
>>>> IndexSearcher in Lucene, and it looks like SolrIndexSearcher is not 
>>>> tickling it first. I'll take a look.
>>>>
>>>> - Mark
>>>>
>>>> Ryan McKinley wrote:
>>>>> Ok, not totally resolved....
>>>>>
>>>>> Things work fine when I have my custom Filter alone or with other 
>>>>> Filters, however if I add a query string to the mix it breaks with 
>>>>> an IllegalStateException:
>>>>>
>>>>> java.lang.IllegalStateException: Auto should be resolved before now
>>>>>   at 
>>>>> org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:216)

>>>>>
>>>>>   at 
>>>>> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:73)

>>>>>
>>>>>   at 
>>>>> org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168)

>>>>>
>>>>>   at 
>>>>> org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:58)

>>>>>
>>>>>   at 
>>>>> org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1214)

>>>>>
>>>>>   at 
>>>>> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:924)

>>>>>
>>>>>   at 
>>>>> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:345)

>>>>>
>>>>>   at 
>>>>> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:171)

>>>>>
>>>>>   at 
>>>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)

>>>>>
>>>>>   at 
>>>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

>>>>>
>>>>>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
>>>>>   at 
>>>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)

>>>>>
>>>>>
>>>>> This is for a query:
>>>>> /solr/flat/select?q=SGID&bounds=-144 2.4 -72 67 WITHIN
>>>>> bounds=XXX triggers my custom filter to kick in.
>>>>>
>>>>> Any thoughts where to look?  This error is new since upgrading the 
>>>>> lucene libs (in recent solr)
>>>>>
>>>>> Thanks!
>>>>> ryan
>>>>>
>>>>>
>>>>> On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote:
>>>>>
>>>>>> thanks!
>>>>>>
>>>>>> everything got better when I removed my logic to cache based on 
>>>>>> the index modification time.
>>>>>>
>>>>>>
>>>>>> On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:
>>>>>>
>>>>>>> On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley 
>>>>>>> <ryantxu@gmail.com> wrote:
>>>>>>>> This issue started on java-user, but I am moving it to solr-dev:
>>>>>>>> http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception

>>>>>>>>
>>>>>>>>
>>>>>>>> I am using solr trunk and building an RTree from stored 
>>>>>>>> document fields.
>>>>>>>> This process worked fine until a recent change in 2.9 that
has 
>>>>>>>> different
>>>>>>>> document id strategy then I was used to.
>>>>>>>>
>>>>>>>> In that thread, Yonik suggested:
>>>>>>>> - pop back to the top level from the sub-reader, if you really

>>>>>>>> need a single
>>>>>>>> set
>>>>>>>> - if a set-per-reader will work, then cache per segment (better

>>>>>>>> for
>>>>>>>> incremental updates anyway)
>>>>>>>>
>>>>>>>> I'm not quite sure what you mean by a "set-per-reader".
>>>>>>>
>>>>>>> I meant RTree per reader (per segment reader).
>>>>>>>
>>>>>>>> Previously I was
>>>>>>>> building a single RTree and using it until the the last 
>>>>>>>> modified time had
>>>>>>>> changed.  This avoided building an index anytime a new reader

>>>>>>>> was opened and
>>>>>>>> the index had not changed.
>>>>>>>
>>>>>>> I *think* that our use of re-open will return the same IndexReader
>>>>>>> instance if nothing has changed... so you shouldn't have to try

>>>>>>> and do
>>>>>>> that yourself.
>>>>>>>
>>>>>>>> I'm fine building a new RTree for each reader if
>>>>>>>> that is required.
>>>>>>>
>>>>>>> If that works just as well, it will put you in a better position

>>>>>>> for
>>>>>>> faster incremental updates... new RTrees will be built only for

>>>>>>> those
>>>>>>> segments that have changed.
>>>>>>>
>>>>>>>> Is there any existing code that deals with this situation?
>>>>>>>
>>>>>>> To cache an RTree per reader, you could use the same logic as
>>>>>>> FieldCache uses... a weak map with the reader as the key.
>>>>>>>
>>>>>>> If a single top-level RTree that covers the entire index works

>>>>>>> better
>>>>>>> for you, then you can cache the RTree based on the top level
multi
>>>>>>> reader and translate the ids... that was my fix for 
>>>>>>> ExternalFileField.
>>>>>>> See FileFloatSource.getValues() for the implementation.
>>>>>>>
>>>>>>>
>>>>>>>> - - - -
>>>>>>>>
>>>>>>>> Yonik also suggested:
>>>>>>>>
>>>>>>>> Relatively new in 2.9, you can pass null to enumerate over
all 
>>>>>>>> non-deleted
>>>>>>>> docs:
>>>>>>>> TermDocs td = reader.termDocs(null);
>>>>>>>>
>>>>>>>> It would probably be a lot faster to iterate over indexed

>>>>>>>> values though.
>>>>>>>>
>>>>>>>> If I iterate of indexed values (from the FieldCache i presume)

>>>>>>>> then how do i
>>>>>>>> get access to the document id?
>>>>>>>
>>>>>>> IndexReader.terms(Term t) returns a TermEnum that can iterate
over
>>>>>>> terms, starting at t.
>>>>>>> IndexReader.termDocs(Term t or TermEnum te) will give you the

>>>>>>> list of
>>>>>>> documents that match a term.
>>>>>>>
>>>>>>>
>>>>>>> -Yonik
>>>>>>
>>>>>
>>>>
>>>>
>>>> -- 
>>>> - Mark
>>>>
>>>> http://www.lucidimagination.com
>>>>
>>>>
>>>>
>>>
>>
>>
>> -- 
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>


-- 
- Mark

http://www.lucidimagination.com




Mime
View raw message