lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Use of AllTermDocs with custom scorer
Date Tue, 17 Nov 2009 10:49:42 GMT
On Mon, Nov 16, 2009 at 6:38 PM, Peter Keegan <> wrote:

>>Can you remap your external data to be per segment?
> That would provide the tightest integration but would require a major
> redesign. Currently, the external data is in a single file created by
> reading a stored field after the Lucene index has been committed. Creating
> this file is very fast with 2.9 (considering the cost of reading all those
> stored fields).

OK.  Though if you update a few docs and open a new reader, you have
to fully recreate the file?  (Or, your app may simply never need to do

>>For your custom sort comparator, are you using FieldComparator?
> I'm using the deprecated FieldSortedHitQueue. I started looking into
> replacing it with FieldComparator, but it was much more involved than I had
> expected, so I postponed. Also, this would only be a partial solution to a
> query with a custom scorer and custom sorter.

You are using FSHQ directly, yourself?  (Ie, not via TopFieldDocCollector)?

FSHQ expects you to init it with the top-level reader, and then insert
using top docIDs.

>>Failing these, Lucene currently visits the readers in index order.
>>So, you could accumulate the docBase by adding up the reader.maxDoc()
>>for each reader you've seen.  However, this may change in future
>>Lucene releases.
> This would work for the Scorer but not the Sorter, right?

I don't fully understand the question -- the sorter is simply a
Collector impl, and Collector.setNextReader tells you docBase when a
the search advances to the next reader.

>>You could also, externally, build your own map from SegmentReader ->
>>docBase, by calling IndexReader.getSequentialSubReaders() and stepping
>>through adding up the maxDoc.  Then, in your search, you can lookup
>>the SegmentReader you're working on to get the docBase?
> I think this would work for both Scorer and Sorter, right?
> This seems like the best solution right now.

This is a generic solution, but just make sure you don't do the
map lookup for every doc collected, if you can help it, else that'll
slow down your search.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message