lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
Date Wed, 19 Aug 2009 02:27:14 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744847#action_12744847
] 

Mark Miller commented on LUCENE-1821:
-------------------------------------

The "internal" vs "external" is kind of confusing made up terms - my fault really.

When I think of using the ids 'internally' I'm thinking that you are taking the index reader
and making no assumptions. You just use the single reader and its id space. You can use those
ids to get values, and you can map from those ids to values.

The assumption being made here is that you can load up ords for every doc and that these ords
will be comparable in a way that every document id across the whole index maps to the same
ord if it has the same value for a field. Nothing in the API promised that to my knowledge
- it just happened to be a happy side effect. 

bq. While sorting is provided by lucene APIs, there is nothing (and should be nothing) stopping
someone from performing sorting on their own terms via the Collector interface and their own
priority queues/API
 
Indeed - just like there is nothing stopping you from continuing to use a MultiReader for
this functionality.

What I mean by sorting is internal is that we specifically support comparing ords/values across
readers. I think we would prefer that you don't count on ids coming from the top reader or
a sub reader in other cases. We don't promise one way or another. We just give a reader and
say work with this reader.

Experts can generally jump around that if they need to - Solr does a bit of this - or you
can choose to continue using Multi-Readers.

I'm not saying we should make it impossible for you to do this - but I don't think we should
open a path for scorers to reconstruct multi-reader virtual ids. I don't think a Scorer should
know or care why type of IndexReader it is working with.

> Weight.scorer() not passed doc offset for "sub reader"
> ------------------------------------------------------
>
>                 Key: LUCENE-1821
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1821
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Tim Smith
>
> Now that searching is done on a per segment basis, there is no way for a Scorer to know
the "actual" doc id for the document's it matches (only the relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all segments), there
is now no way to index into them properly from inside a Scorer because the scorer is not passed
the needed offset to calculate the "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as well as a method
to get the offset
> All Weights that have "sub" weights must pass this offset down to created "sub" weights

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message