lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Allan Hill <p...@metajure.com>
Subject recording a universal ID from DocID in a CustomScoreQuery
Date Sat, 04 Feb 2012 00:09:57 GMT
My Index does NOT have a simple UID, it uses the file PATH to the file as the unique key.
I was implementing a CustomScoreQuery which not only tweaked the score it also wanted to write
down which documents had passed through this part of overall rebuilt query, so that I could
further mess with those particular documents later.
I was hoping to do it without using loading up all PATHs from my index into a field cache,
but maybe that is a false way to try to save memory.

I thought I could write down the docId provided in the call to customScore

public float customScore(int doc, float subQueryScore, float valSrcScore) throws IOException
{
     docIds.add(docId);
   return ...;
  }

private Set<Integer> docIds = new HashSet<Integer>();

While I thought I had this working, apparently I had not taken into consideration the subreader
and segment problem.
The int called doc is not the docId for the entire index, just the local reader doc number.
 Is that right?
So is there a standard way to convert back to the index wide DocID?

If there is no standard way, I _might_ create a small subclass of IndexSearcher and provide
a method to:


(1)    Find the right reader by looping through all IndexSearcher.subReaders[] to find what
reader called the CustomScoreQuery

(2)    Add an offset of the proper value from IndexSearcher.docStarts[iReader]

But I'm am thinking this prone to the problem that subreader can be made of more subreaders
etc., so I really don't have a clue where to find the current reader and then to map back
to
docStarts.

I also think I'm doing this wrong, because ReaderUtil has nothing like this?

Is there some way to note for later that a particular document came through this function
query or should I just accept the fact of using the field cache?

-Paul





Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message