lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
Date Fri, 21 Aug 2009 13:28:15 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745963#action_12745963
] 

Mark Miller edited comment on LUCENE-1821 at 8/21/09 6:27 AM:
--------------------------------------------------------------

bq. (it was undefined at best, but was clear from both lucene code and my use of the API that
i did have the full index context)

It was an implementation detail. If you look at MultiSearcher, Searchable, Searcher and how
the API is put together, you can see we don't support that type of thing. I think its fairly
clear after a little thought.

You can limit your API's to handle just IndexSearchers, but as a project, we cannot.

bq. it seems like it should be ok to pass the IndexSearcher (the direct context for the IndexReader)


MultiSearcher and Searchable make this impossible IMO. We would be playing to those that don't
fully use the API, and thats a mistake in my opinion. At best, we would have to shift the
whole API.

Its okay to pass the Reader because its a contextless Reader. There is no value in also passing
a contextless Searcher IMO - especially when its an arbitrary different context. We have to
live up to the current API - your throwing MultiSearcher, Searchable, Remote out the window.

bq. be fair, 2.9 has a lot of back compat breaks,

Oh I'm fair, I know that for sure - though I do like to argue way to much for my own good.
All of these back compat breaks were painful to stomach ;) But we reached each one under special
circumstances - usually our own early incompetence :) We technically are not allowed to just
break things though. We break to fix what we already accidentally broke, or we break when
we screwed up earlier and we are in between a rock and a hard place now - or we break when
something else is broke anyway, so lets do more :) This was the release of the break for sure.
We don't necessarily want this to happen every release though, and its our responsibility
to strive towards our back compat policy (listed on the wiki).

I'm not talking about a break in adding a Searcher - that would be fine - back compat is already
broken there - but unless we can pass a MultiSearcher there over a remove RMI call, its a
break of the whole API IMO.

      was (Author: markrmiller@gmail.com):
    bq .(it was undefined at best, but was clear from both lucene code and my use of the API
that i did have the full index context)

It was an implementation detail. If you look at MultiSearcher, Searchable, Searcher and how
the API is put together, you can see we don't support that type of thing. I think its fairly
clear after a little thought.

You can limit your API's to handle just IndexSearchers, but as a project, we cannot.

bq. it seems like it should be ok to pass the IndexSearcher (the direct context for the IndexReader)


MultiSearcher and Searchable make this impossible IMO. We would be playing to those that don't
fully use the API, and thats a mistake in my opinion. At best, we would have to shift the
whole API.

Its okay to pass the Reader because its a contextless Reader. There is no value in also passing
a contextless Searcher IMO - especially when its an arbitrary different context. We have to
live up to the current API - your throwing MultiSearcher, Searchable, Remote out the window.

bq. be fair, 2.9 has a lot of back compat breaks,

Oh I'm fair, I know that for sure - though I do like to argue way to much for my own good.
All of these back compat breaks were painful to stomach ;) But we reached each one under special
circumstances - usually our own early incompetence :) We technically are not allowed to just
break things though. We break to fix what we already accidentally broke, or we break when
we screwed up earlier and we are in between a rock and a hard place now - or we break when
something else is broke anyway, so lets do more :) This was the release of the break for sure.
We don't necessarily want this to happen every release though, and its our responsibility
to strive towards our back compat policy (listed on the wiki).

I'm not talking about a break in adding a Searcher - that would be fine - back compat is already
broken there - but unless we can pass a MultiSearcher there over a remove RMI call, its a
break of the whole API IMO.
  
> Weight.scorer() not passed doc offset for "sub reader"
> ------------------------------------------------------
>
>                 Key: LUCENE-1821
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1821
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Tim Smith
>             Fix For: 2.9
>
>         Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a Scorer to know
the "actual" doc id for the document's it matches (only the relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all segments), there
is now no way to index into them properly from inside a Scorer because the scorer is not passed
the needed offset to calculate the "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as well as a method
to get the offset
> All Weights that have "sub" weights must pass this offset down to created "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed in Searcher
(casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if gatherSubReaders
in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
>     return 0;
>   } else {
>     List readers = new ArrayList();
>     gatherSubReaders(readers);
>     Iterator iter = readers.iterator();
>     int maxDoc = 0;
>     while (iter.hasNext()) {
>       IndexReader r = (IndexReader)iter.next();
>       if (r == reader) {
>         return maxDoc;
>       } 
>       maxDoc += r.maxDoc();
>     } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message