lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <>
Subject Caching Filters and docIds when using MultiSearcher/IndexSearcher(MultiReader)...
Date Fri, 12 Sep 2008 02:57:40 GMT
Up to now I have only needed to search a single index, but now I will have many 
index shards to search across.  My existing search mantained cached filters for 
the index as well as a cache of my own unique ID fields in the index, keyed by 
Lucene DocId.

Now I need to search multiple indices, I am trying to work out how to continue 
to use these caches.

I have one index per month of data (up to 10M docs per month) and users can 
search across whichever date range they want, so one search may search Index 
1-->12 (e.g. Jan07-Dec07) and another 13-20 (Jan08-Aug08).

It makes no sense to cache a single bitset generated from a MultiReader over 
indices 1-12 when the next search could be for indices 2-11 and all the bits 
would be useless, so to be of any use, caches, including cached BitSets should 
therefore contain the doc ids specific to the particular index rather than to 
any particular MultiReader.  Then my Filter implementation can determine the 
real doc id and delegate to a bitset for the particular reader instance.

This means I need to find the original reader/searcher instance and the 
particular doc Id from that instance to perform bitset checks or cache lookups.

In the MultiSearcher there is subDoc and subSearcher, but there's no such beast 
for an IndexReader to find the real reader/doc from the pseudo one.

This also raises the question about MultiSearcher vs IndexSearcher(MultiReader) 
which, even after reading the the archives, I am unsure which I should use - 
there seem to be comments in the dev list to avoid MultiSearcher...

Any thoughts or have I spiralled too far into Lucene's depths to see where I am...?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message