lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
Date Wed, 19 Aug 2009 09:46:18 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744970#action_12744970
] 

Michael McCandless commented on LUCENE-1821:
--------------------------------------------


BTW contrib/spatial has exactly this same problem.  It currently
builds up a cache, keyed on the "top" (MultiReader's) docID, of the
precise distance computed by its precise distance filters, to then be
used during sorting.  Right now it simply computes its own docBase and
increments it every time getDocIdSet() is called (which is messy).
Though I think it could (and should) switch to a per-segment cache.

I am torn.  On the one hand we don't want to encourage apps to be
using "top docIDs" anywhere "down low" (eg Weight/Scorer).  We'd like
all such per-segment swtiching to happen "up high".

But on the other hand, this is quite a sudden change, and most
advanced apps will be using the top docIDs by definition (since
per-segment docIDs only becomes an [easy] option in 2.9), so it'd be
more friendly to offer up a cleaner migration path for such apps where
Weight/Scorer is told its docBase.

And, having to migrate an ord index from "top" to "sub" docIDs is
truly a nightmare, having gone through that with Mark in getting
String sorting to work per segment!


> Weight.scorer() not passed doc offset for "sub reader"
> ------------------------------------------------------
>
>                 Key: LUCENE-1821
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1821
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Tim Smith
>
> Now that searching is done on a per segment basis, there is no way for a Scorer to know
the "actual" doc id for the document's it matches (only the relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all segments), there
is now no way to index into them properly from inside a Scorer because the scorer is not passed
the needed offset to calculate the "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as well as a method
to get the offset
> All Weights that have "sub" weights must pass this offset down to created "sub" weights

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message