lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: No subsearcher in Lucene 3.3?
Date Tue, 30 Aug 2011 19:53:18 GMT

Use ReaderUtil from o.a.l.util package that does the recursive traversal of
the reader tree. It has methods to solve this problems. You can cache the
int[] start array that contains the starting document ids for each
subreader. This makes it possible to use standard TopDocs based search
without Collectors (which should not be required for your case) to remap the
document ids.

As for this issue you are not interested in stepping recursively into the
reader tree to the lowest level (as non-optimized subindexes will also
expand to multiple readers), so the only thing you would like to know is: on
which direct subreader of MultiReader you are interested. For a quick
lookup, an approach might be to iterate *once* before search over the direct
subreaders of the MultiReader (without recursion), and sum up the maxDoc()
(not numDocs!) return values. For each subreader (starting with 0) put the
sum into a TreeMap (!!!) with the target index name or whatever you need to
identify the subreader. You can then lookup the docid from the TopDocs
object using TreeMap.floorEntry(docId).getValue() (Java 6 only).


Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen

> -----Original Message-----
> From: Devon H. O'Dell []
> Sent: Tuesday, August 30, 2011 8:04 PM
> To:
> Subject: Re: No subsearcher in Lucene 3.3?
> 2011/8/30 Joe MA <>:
> > When searching a single collection, no problem.  But if I want to search
> two collections at the same time, I need to know which collection the hit
> from so I can retrieve the base_path from the database.  These base_paths
> be different.  As mentioned, this was trivial in Lucene 1.x and 2.x as I
> grabbed the subsearcher from the result, which would for example return a
> or 2 indicating which of the two collections the result came from.  Then I
> build the path to the file.  In other words, subsearcher gave me the
foreign key
> I needed to map to additional external information associated with each
> during a multisearch.  That is now gone in Lucene 3.3.
> You could use the suggestion I made of doing the loop over the IndexReader
> subReaders (recursively until you get to the
> SegmentReaders) and use a HashMap<SegmentReader, String> (or similar
> container structure) to associate the segments to a path. It sounds like
> application doesn't reopen indexes with much frequency, which is good: you
> will need to regenerate this map any time you reopen your indexes.
> When collector.setNextReader is called, you can simply get (at that
> point) the String associated with the particular SegmentReader you're
> with. Then, every time Collector.collect is called, you can tack that on
> whatever data structure you're using to get at your documents. It doesn't
> to be high memory overhead if you make sure the strings are interned.
> Perhaps Uwe or other Lucene devs have better ideas for approaching this;
> often do :)
> --dho
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message