lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Allan Hill <>
Subject RE: recording a universal ID from DocID in a CustomScoreQuery
Date Mon, 06 Feb 2012 23:12:06 GMT
To complete this thread, I read the document itself with a 1 field fieldSelector, so as not
to bother with anything but exactly what I needed at this point in the code (particular not
the text body).

Then I saved the primary key (the path) of documents that visited this CustomScoreQuery (function
query) in a Set<String> seenDocs
                seenDocs.add(reader.document(docId, fieldSelector ).getFieldable(KEY_FIELD).stringValue());

If We do introduce a short global unique ID field, the code needs little change to move to
a different field.

When the entire query rounded up all the results, It asks the question which ones had come
through that function query by consulting the list of seenDocs.

I decided NOT to use the fieldcache for this particular application, because the number of
documents that are the result of this part of the query are very small compared to all documents
Their rarity was the point of knowing, so that I could mark the result as 'special' for other
parts of the application.  Such special documents get different treatment in the UI, but that's
not my concern, just IDing which ones was the useful part for index layer.

As usual thanks for the feedback.


> -----Original Message-----
> From: Ian Lea []
> Sent: Monday, February 06, 2012 3:54 AM
> To:
> Subject: Re: recording a universal ID from DocID in a CustomScoreQuery
> int doc will be for the subreader, not for the entire index.
> has setNextReader(IndexReader reader, int
> docBase) which you might somehow be able to use.  Failing that I'd go for FieldCache,
or store the
> docids in a Set in a Map keyed by current Reader, if that would give you what you needed
for the
> subsequent messing around.
> --
> Ian.
> On Sat, Feb 4, 2012 at 12:09 AM, Paul Allan Hill <> wrote:
> > My Index does NOT have a simple UID, it uses the file PATH to the file as the unique
> > I was implementing a CustomScoreQuery which not only tweaked the score it also wanted
to write
> down which documents had passed through this part of overall rebuilt query, so that I
could further
> mess with those particular documents later.
> > I was hoping to do it without using loading up all PATHs from my index into a field
cache, but maybe
> that is a false way to try to save memory.
> >
> > I thought I could write down the docId provided in the call to
> > customScore
> >
> > public float customScore(int doc, float subQueryScore, float
> > valSrcScore) throws IOException {
> >     docIds.add(docId);
> >   return ...;
> >  }
> >
> > private Set<Integer> docIds = new HashSet<Integer>();
> >
> > While I thought I had this working, apparently I had not taken into consideration
the subreader and
> segment problem.
> > The int called doc is not the docId for the entire index, just the local reader
doc number.  Is that
> right?
> > So is there a standard way to convert back to the index wide DocID?
> >
> > If there is no standard way, I _might_ create a small subclass of IndexSearcher
and provide a method
> to:
> >
> >
> > (1)    Find the right reader by looping through all
> > IndexSearcher.subReaders[] to find what reader called the
> > CustomScoreQuery
> >
> > (2)    Add an offset of the proper value from
> > IndexSearcher.docStarts[iReader]
> >
> > But I'm am thinking this prone to the problem that subreader can be
> > made of more subreaders etc., so I really don't have a clue where to find the current
reader and
> then to map back to docStarts.
> >
> > I also think I'm doing this wrong, because ReaderUtil has nothing like this?
> >
> > Is there some way to note for later that a particular document came through this
function query or
> should I just accept the fact of using the field cache?
> >
> > -Paul
> >
> >
> >
> >
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message