lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "patrick o'leary" <pj...@pjaol.com>
Subject Re: ReadOnlyMultiSegmentReader bitset id vs doc id
Date Wed, 29 Apr 2009 04:17:16 GMT
Ok finally with some pointers from Ryan, figured out the last problem.
So as a note to anyone else who might encounter the same problems with
multireader

A) Directories can contain multiple segments and a reader for those segments
B) Searches are replayed within each reader in a serial fashion **
C) If utilizing FieldCache / BitSet or anything related to document position
within a reader, and you need docId
   -- document id = (sum of previous reader maxdocs )+ bitset position

e.g.
int offset;
int nextOffset;

public DocIdSet getDocIdSet(IndexReader reader) {

   OpenBitSet bitset = new OpenBitSet(reader.maxDoc());
   offset += reader.maxDoc();
   for (int i =0; i reader.maxDoc(); i++)  {
        .....
        .... filter stuff ....
        ....
        if ( good ) {
           bitset.set( i );

           int docId = i + nextOffset;
           ...........
        }
   }

  nextOffset += offset;
  .......
}


K, works time for sleep

P


On Tue, Apr 28, 2009 at 5:44 PM, patrick o'leary <pjaol@pjaol.com> wrote:

> Think I may have found it, it was multiple runs of the filter, one for each
> segment reader, I was generating a new map to hold distances each time. So
> only the distances from the
> last segment reader were stored.
>
> Currently it looks like those segmented searches are done serially, well in
> solr they are-
> I presume the end goal is to make them multi-threaded ?
> I'll need to make my map synchronized
>
>
> On Tue, Apr 28, 2009 at 4:42 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>
>>  What is the problem exactly? Maybe you use the new Collector API, where
>> the search is done for each segment, so caching does not work correctly?
>>
>>
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>   ------------------------------
>>
>> *From:* patrick o'leary [mailto:pjaol@pjaol.com]
>> *Sent:* Tuesday, April 28, 2009 10:31 PM
>> *To:* java-dev@lucene.apache.org
>> *Subject:* ReadOnlyMultiSegmentReader bitset id vs doc id
>>
>>
>>
>> hey
>>
>> I've got a filter that's storing document id's with a geo distance for
>> spatial lucene using a bitset position for doc id,
>> However with a MultiSegmentReader that's no longer going to working.
>>
>> What's the most appropriate way to go from bitset position to doc id now?
>>
>> Thanks
>> Patrick
>>
>
>

Mime
View raw message