lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <>
Subject [jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
Date Wed, 19 Aug 2009 01:29:14 GMT


Mark Miller commented on LUCENE-1821:

bq. I think Tim's got a valid point though about wanting an ordinal value across the entire
index ...

I don't disagree about wanting them at all. Hes using them for a neat purpose. 

bq. he's not using external ids, he's using the internal lucene docIds

If he were respecting the internal ids, you wouldn't need to calculate the multi-reader id.
Hes essentially caching the multi-reader ids - thats the same as using a filter that always
allows doc 0 to pass - its using the internal ids externally. To use the ids correctly, you
get a reader and an id space that starts at 0 for that reader. If you want to use the whole
reader, you should work with the multi-reader. You can use the multi-reader without breaking
it apart here as well if you need to.

I think its a slippery slope - we start having to support both the segment ids, plus the multi-reader
ids. And as we work on real-time, we will have to count on users caching that way - I think
its better to try and work all of our support towards per segment.

I'll leave it for smarter people to discuss for now - but I don't think its the right path.
He can essentially do what he needs without built in support, and personally I think thats
the way to go. I think its great that right now, other than the sorting/hitcollector, things
don't know about the sub reader breakout.

> Weight.scorer() not passed doc offset for "sub reader"
> ------------------------------------------------------
>                 Key: LUCENE-1821
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Tim Smith
> Now that searching is done on a per segment basis, there is no way for a Scorer to know
the "actual" doc id for the document's it matches (only the relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all segments), there
is now no way to index into them properly from inside a Scorer because the scorer is not passed
the needed offset to calculate the "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as well as a method
to get the offset
> All Weights that have "sub" weights must pass this offset down to created "sub" weights

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message