lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Allan Hill <p...@metajure.com>
Subject recording a universal ID from DocID in a CustomScoreQuery
Date Sat, 04 Feb 2012 00:09:57 GMT
My Index does NOT have a simple UID, it uses the file PATH to the file as t=
he unique key.
I was implementing a CustomScoreQuery which not only tweaked the score it a=
lso wanted to write down which documents had passed through this part of ov=
erall rebuilt query, so that I could further mess with those particular doc=
uments later.
I was hoping to do it without using loading up all PATHs from my index into=
 a field cache, but maybe that is a false way to try to save memory.

I thought I could write down the docId provided in the call to customScore

public float customScore(int doc, float subQueryScore, float valSrcScore) t=
hrows IOException {
     docIds.add(docId);
   return ...;
  }

private Set<Integer> docIds =3D new HashSet<Integer>();

While I thought I had this working, apparently I had not taken into conside=
ration the subreader and segment problem.
The int called doc is not the docId for the entire index, just the local re=
ader doc number.  Is that right?
So is there a standard way to convert back to the index wide DocID?

If there is no standard way, I _might_ create a small subclass of IndexSear=
cher and provide a method to:


(1)    Find the right reader by looping through all IndexSearcher.subReader=
s[] to find what reader called the CustomScoreQuery

(2)    Add an offset of the proper value from IndexSearcher.docStarts[iRead=
er]

But I'm am thinking this prone to the problem that subreader can be made of=
 more subreaders etc., so I really don't have a clue where to find the curr=
ent reader and then to map back to
docStarts.

I also think I'm doing this wrong, because ReaderUtil has nothing like this=
?

Is there some way to note for later that a particular document came through=
 this function query or should I just accept the fact of using the field ca=
che?

-Paul





Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message