lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Smith (JIRA)" <>
Subject [jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
Date Wed, 19 Aug 2009 01:47:14 GMT


Tim Smith commented on LUCENE-1821:

bq. The goal is to move all caches to the segment level in Lucene - we don't want to encourage
users to cache per multi-reader by providing API help to do so.
I agree that this is the goal, and that using per segment caches should be the encouraged
route for field caching needs. 
I plan to update the vast majority of the caches i use to be loaded on a per segment basis
once i switch to 2.9 to take advantage of this.
But it should still be possible for advanced users to do caching on the multireader level.
This may require porting upon subsequent versions of lucene (as i'm seeing i will have to
for 2.9), however this should remain possible

bq. If you need index wide stats, you use the Weight.
I'm currently using weight to get this cache on the multireader level, however with 2.9 i
will have to jump through some more hoops in order to be able to use this cache on each sub
reader's scorer

bq. You are trying to use the internal ids externally
All my usage of "internal" docids occurs inside Weight, Scorer, and HitCollector implementations.
I don't see how this is really "external" as it is using published interfaces. Its just that
the interpretation of these interfaces changed for 2.9 (i have no problem with this as long
as i can port from 2.4 with minimal to moderate effort). The reason they were able to change
was only because no implementations provided by vanilla lucene or in contrib required the
"whollistic" view of the index

bq. The FieldCache is the caching mechanism that Lucene supports with internal ids - and it
supports it per segment.
The FieldCache mechanism did not meet all my needs with regards to schema/retention policy/etc,
so i have been doing caching in my own code base for quite some time. While the FieldCache
usage should be encouraged, it should not be required of advanced users. It should be acceptable
for advanced users to feel some pain on upgrading, but there should be a rather clear path
for doing so (without a loss of functionality, and ideally without requiring custom patches
on top of a released version of lucene)

bq. Sorting is internal.
While sorting is provided by lucene APIs, there is nothing (and should be nothing) stopping
someone from performing sorting on their own terms via the Collector interface and their own
priority queues/API

> Weight.scorer() not passed doc offset for "sub reader"
> ------------------------------------------------------
>                 Key: LUCENE-1821
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Tim Smith
> Now that searching is done on a per segment basis, there is no way for a Scorer to know
the "actual" doc id for the document's it matches (only the relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all segments), there
is now no way to index into them properly from inside a Scorer because the scorer is not passed
the needed offset to calculate the "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as well as a method
to get the offset
> All Weights that have "sub" weights must pass this offset down to created "sub" weights

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message