lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan McKinley <ryan...@gmail.com>
Subject Re: BitSet Filter ArrayIndexOutOfBoundsException?
Date Thu, 16 Apr 2009 00:35:07 GMT
uggg.  So there is no longer a consistent docId I can use in a filter?

I have an operation that is quite expensive that I am hoping to run  
only once for each time the index changes.  Is the

How would I get all the doc ids with a given (stored) field from a  
Reader?  I am trying:

  TermDocs td = reader.termDocs();
   while( td.next() ) {
     int id = td.doc();
     Document doc = searcher.doc( id, selector );
     ...

but the termDocs() function is always empty (The index is not empty)

Thanks
ryan




On Apr 15, 2009, at 7:41 PM, Uwe Schindler wrote:

> Use the index reader given to getDocIdSet. The Ids are only valid  
> for that
> index reader. This is new in Lucene 2.9: filters are executed  
> against each
> segment of an index separately, so the docids of the
> MultiReader/DirectoryIndexReader are different to the local ones.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>> -----Original Message-----
>> From: Ryan McKinley [mailto:ryantxu@gmail.com]
>> Sent: Thursday, April 16, 2009 1:34 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: BitSet Filter ArrayIndexOutOfBoundsException?
>>
>> Are you saying there lucene document could have different ids in the
>> MultiReader and the IndexReader?
>>
>> I have assumed that the ids have not changed as long as the
>> lastmodified time has not changed:
>>   long lastmodified = IndexReader.lastModified( reader.directory() );
>> Is this assumption correct?
>>
>> I get the original ids using:
>>
>>     SolrIndexSearcher searcher = ...
>>     DocList docs = searcher.getDocList( new MatchAllDocsQuery(),
>>         (DocSet)null, null, 0, Integer.MAX_VALUE );
>>
>> and assume that nothing has changed as long as:
>>    IndexReader.lastModified( searcher.getReader().directory() );
>> has not changed.
>>
>> Am I missing something?
>>
>> If so, how would I get access to the docId expected by
>> Filter#getDocIdSet()?
>>
>> thanks!
>> ryan
>>
>>
>> On Apr 15, 2009, at 5:41 PM, Michael McCandless wrote:
>>
>>> Maybe it's because you're using the MultiReader docID space but
>>> getDocIdSet(IndexReader) expects you to use the docID space for that
>>> IndexReader (ie, a single segment)?
>>>
>>> Mike
>>>
>>> On Wed, Apr 15, 2009 at 1:37 PM, Ryan McKinley <ryantxu@gmail.com>
>>> wrote:
>>>> I am working on a Filter that uses an RTree to test for inclusion.
>>>> This
>>>> Filter works great *most* of the time -- if the index is optimized,
>>>> it works
>>>> all of the time.  I feel like I am missing something basic, but not
>>>> sure
>>>> what it could be.
>>>>
>>>> Each time the reader opens (and the index has changed), I build an
>>>> RTree
>>>> from stored fields.  The RTree holds the lucene document ID and is
>>>> later
>>>> used in a Filter/Query.  This is how I build the RTree:
>>>>
>>>> FieldSelector selector = new MapFieldSelector( new String[]
>>>> { "extent" } );
>>>> DocIterator iter = docs.iterator();
>>>> while( iter.hasNext() ) {
>>>>   int id = iter.nextDoc();
>>>>   Document doc = searcher.doc( id, selector );
>>>>   Fieldable ff = doc.getFieldable( "extent" );
>>>>   if( ff != null && !reader.isDeleted( id ) ) {
>>>>     ... add the id to the RTree ...
>>>>   }
>>>> }
>>>>
>>>> In the Filter, I run query my RTree and add results to a BitSet
>>>>
>>>> public DocIdSet getDocIdSet(IndexReader reader) throws IOException
>>>> {
>>>>   final BitSet bits = new BitSet();
>>>>
>>>>   // ... query the RTree adding matching ids to the BitSet...
>>>>     bits.set( id );
>>>>
>>>>   return new DocIdBitSet( bitset );
>>>> }
>>>>
>>>> When things go wrong, I get an error like this:
>>>>
>>>> java.lang.ArrayIndexOutOfBoundsException: 67
>>>>    at org.apache.lucene.util.OpenBitSet.fastSet(OpenBitSet.java: 
>>>> 242)
>>>>    at
>>>> org
>>>> .apache
>>>> .solr.search.DocSetHitCollector.collect(DocSetHitCollector.java:63)
>>>>    at
>>>> org.apache.lucene.search.IndexSearcher
>>>> $MultiReaderCollectorWrapper.collect(IndexSearcher.java:313)
>>>>    at org.apache.lucene.search.Scorer.score(Scorer.java:58)
>>>>    at
>>>> org.apache.lucene.search.IndexSearcher.doSearch(IndexSearcher.java:
>>>> 262)
>>>>    at
>>>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java: 
>>>> 250)
>>>>    at org.apache.lucene.search.Searcher.search(Searcher.java:126)
>>>>    at
>>>> org
>>>> .apache
>>>> .solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:
>>>> 691)
>>>>    at
>>>> org
>>>> .apache
>>>> .solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java: 
>>>> 597)
>>>>    at
>>>> org
>>>> .apache
>>>> .solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java: 
>>>> 633)
>>>>    at
>>>> org
>>>> .apache
>>>> .solr
>>>> .search 
>>>> .SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:
>>>> 1154)
>>>>    at
>>>> org
>>>> .apache
>>>> .solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:
>>>> 924)
>>>>    at
>>>> org
>>>> .apache 
>>>> .solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:
>>>> 345)
>>>>    at
>>>> org
>>>> .apache
>>>> .solr.handler.component.QueryComponent.process(QueryComponent.java:
>>>> 171)
>>>>
>>>> I'm guessing it is referencing a deleted document or something like
>>>> that,
>>>> but I figured the:
>>>> && !reader.isDeleted( id ) clause would take care of that.
>>>>
>>>> Any pointers would be great!
>>>>
>>>> Thanks
>>>> Ryan
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message