lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <>
Subject [jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
Date Fri, 21 Aug 2009 12:49:14 GMT


Mark Miller commented on LUCENE-1821:

bq. I'm OK with having to jump through some hoops in order to get back to the "full index"

You never officially had the full index context - only because you jettison a large part of
the API did you have it.

bq.  this would be best handled by adding a Searcher as the first arg to Weight.scorer()

The current API would not support this without back compat breaks up the wazoo - the MultiSearcher
can be on the client - its not available on the server. Passing just the local Searcher does
not jive with the API.

{quote}for string sorting, it makes a big difference - you now have to do a bunch of String.equals()
calls, where you didn't have to in 2.4 (just used the ord index)
Given this reason, you should really be able to do string sorting 2 ways{quote}

This is only valid for those short circuiting the API and ignoring MultiSearcher and its affects
on the API. As a project, we can't and shouldn't support this type of thing unless we can
make it work with MultiSearcher or eventually pull MultiSearcher.

bq. In the end, it should be up to the application developer to choose what strategy works
best for them, and their application (fast commits/fast cache loading may take a back seat
to fast query execution)

You can pick, but we have to be true to the API or change it (not easy with our back compat

> Weight.scorer() not passed doc offset for "sub reader"
> ------------------------------------------------------
>                 Key: LUCENE-1821
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Tim Smith
>             Fix For: 2.9
>         Attachments: LUCENE-1821.patch
> Now that searching is done on a per segment basis, there is no way for a Scorer to know
the "actual" doc id for the document's it matches (only the relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all segments), there
is now no way to index into them properly from inside a Scorer because the scorer is not passed
the needed offset to calculate the "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as well as a method
to get the offset
> All Weights that have "sub" weights must pass this offset down to created "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed in Searcher
(casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if gatherSubReaders
in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
>     return 0;
>   } else {
>     List readers = new ArrayList();
>     gatherSubReaders(readers);
>     Iterator iter = readers.iterator();
>     int maxDoc = 0;
>     while (iter.hasNext()) {
>       IndexReader r = (IndexReader);
>       if (r == reader) {
>         return maxDoc;
>       } 
>       maxDoc += r.maxDoc();
>     } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight implementation

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message