lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joe MA" <>
Subject RE: No subsearcher in Lucene 3.3?
Date Wed, 31 Aug 2011 09:27:05 GMT
Thanks I will give this a try, seems like it should work for my case.

-----Original Message-----
From: Uwe Schindler [] 
Sent: Tuesday, August 30, 2011 3:53 PM
Subject: RE: No subsearcher in Lucene 3.3?


Use ReaderUtil from o.a.l.util package that does the recursive traversal of
the reader tree. It has methods to solve this problems. You can cache the
int[] start array that contains the starting document ids for each
subreader. This makes it possible to use standard TopDocs based search
without Collectors (which should not be required for your case) to remap the
document ids.

As for this issue you are not interested in stepping recursively into the
reader tree to the lowest level (as non-optimized subindexes will also
expand to multiple readers), so the only thing you would like to know is: on
which direct subreader of MultiReader you are interested. For a quick
lookup, an approach might be to iterate *once* before search over the direct
subreaders of the MultiReader (without recursion), and sum up the maxDoc()
(not numDocs!) return values. For each subreader (starting with 0) put the
sum into a TreeMap (!!!) with the target index name or whatever you need to
identify the subreader. You can then lookup the docid from the TopDocs
object using TreeMap.floorEntry(docId).getValue() (Java 6 only).


Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen

> -----Original Message-----
> From: Devon H. O'Dell []
> Sent: Tuesday, August 30, 2011 8:04 PM
> To:
> Subject: Re: No subsearcher in Lucene 3.3?
> 2011/8/30 Joe MA <>:
> > When searching a single collection, no problem.  But if I want to 
> > search
> two collections at the same time, I need to know which collection the 
> hit
> from so I can retrieve the base_path from the database.  These 
> base_paths
> be different.  As mentioned, this was trivial in Lucene 1.x and 2.x as 
> I
> grabbed the subsearcher from the result, which would for example 
> return a
> or 2 indicating which of the two collections the result came from.  
> Then I
> build the path to the file.  In other words, subsearcher gave me the
foreign key
> I needed to map to additional external information associated with 
> each
> during a multisearch.  That is now gone in Lucene 3.3.
> You could use the suggestion I made of doing the loop over the 
> IndexReader subReaders (recursively until you get to the
> SegmentReaders) and use a HashMap<SegmentReader, String> (or similar 
> container structure) to associate the segments to a path. It sounds 
> like
> application doesn't reopen indexes with much frequency, which is good: 
> you will need to regenerate this map any time you reopen your indexes.
> When collector.setNextReader is called, you can simply get (at that
> point) the String associated with the particular SegmentReader you're
> with. Then, every time Collector.collect is called, you can tack that 
> on
> whatever data structure you're using to get at your documents. It 
> doesn't
> to be high memory overhead if you make sure the strings are interned.
> Perhaps Uwe or other Lucene devs have better ideas for approaching 
> this;
> often do :)
> --dho
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message