lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Use of AllTermDocs with custom scorer
Date Tue, 17 Nov 2009 16:51:46 GMT
On Tue, Nov 17, 2009 at 8:58 AM, Peter Keegan <> wrote:
> The external data is just an array of fixed-length records, one for each
> Lucene document. Indexes are updated at regular intervals in one jvm. A
> searcher jvm opens the index and reads all the fixed-length records into
> RAM. Given an index-wide docId, the custom scorer can quickly access the
> corresponding fixed-length external data.
> Could you explain a bit more about how mapping the external data to be per
> segment would work? As I said, rebuilding the whole file isn't a big deal
> and the single file keeps the Searcher's use of it simple.

Well, you could use IndexReader.getSequentialSubReaders(), then step
through that array of SegmentReaders, making a seprate external file
for each?

This way, when you reopen your readers, you would only need to make a
new external file for those segments that are new.

But if re-creating the entire file on each reopen isn't a problem for
you then there's no need to change this :)

> With or without a SegmentReader->docBase map (which does sound like a huge
> performance hit), I still don't see how the custom scorer gets the segment
> number. Btw, the custom scorer usually becomes part of a ConjunctionScorer
> (if that matters)

Looks like you already answered this (Lucene asks the Query's weight
for a new scorer one segment at a time).

>>FSHQ expects you to init it with the top-level reader, and then insert
> using top docIDs.
> For sorting, I'm using FSHQ directly with a custom collector that inserts
> docs to the FSHQ. But the custom collector is passed the segment-relative
> docId and the custom comparator needs the index-wide docId. The custom
> collector extends HitCollector. I'm missing where this type of collector
> finds the docBase.

Hmm -- if you are extending HitCollector and passing that to search(),
then the docIDs fed to it should already be top-level docIDs, not
segment relative.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message