Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-dev@lucene.apache.org
Message-ID: <1749681280.1250648834882.JavaMail.jira@brutus>
Date: Tue, 18 Aug 2009 19:27:14 -0700 (PDT)
From: "Mark Miller (JIRA)" <jira@apache.org>
To: java-dev@lucene.apache.org
Subject: [jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc
 offset for "sub reader"
In-Reply-To: <1931573201.1250629228444.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744847#action_12744847 ] 

Mark Miller commented on LUCENE-1821:
-------------------------------------

The "internal" vs "external" is kind of confusing made up terms - my fault really.

When I think of using the ids 'internally' I'm thinking that you are taking the index reader and making no assumptions. You just use the single reader and its id space. You can use those ids to get values, and you can map from those ids to values.

The assumption being made here is that you can load up ords for every doc and that these ords will be comparable in a way that every document id across the whole index maps to the same ord if it has the same value for a field. Nothing in the API promised that to my knowledge - it just happened to be a happy side effect. 

bq. While sorting is provided by lucene APIs, there is nothing (and should be nothing) stopping someone from performing sorting on their own terms via the Collector interface and their own priority queues/API
 
Indeed - just like there is nothing stopping you from continuing to use a MultiReader for this functionality.

What I mean by sorting is internal is that we specifically support comparing ords/values across readers. I think we would prefer that you don't count on ids coming from the top reader or a sub reader in other cases. We don't promise one way or another. We just give a reader and say work with this reader.

Experts can generally jump around that if they need to - Solr does a bit of this - or you can choose to continue using Multi-Readers.

I'm not saying we should make it impossible for you to do this - but I don't think we should open a path for scorers to reconstruct multi-reader virtual ids. I don't think a Scorer should know or care why type of IndexReader it is working with.

> Weight.scorer() not passed doc offset for "sub reader"
> ------------------------------------------------------
>
>                 Key: LUCENE-1821
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1821
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Tim Smith
>
> Now that searching is done on a per segment basis, there is no way for a Scorer to know the "actual" doc id for the document's it matches (only the relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all segments), there is now no way to index into them properly from inside a Scorer because the scorer is not passed the needed offset to calculate the "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created "sub" weights

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org