lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters
Date Mon, 02 Dec 2013 15:13:39 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836593#comment-13836593
] 

David Smiley commented on LUCENE-2395:
--------------------------------------

These issues are unrelated.  The expressions module, I believe, allows convenient ways to
reference numbers in DocValues/FieldCache together with various functions (usually mathematical)
for Sorting or relevancy.  But that is expressly excluded from the issue title "does not need
caches".  That's a worthwhile goal for some use-cases -- no cache means more NRT friendly.
 Furthermore, AFAIK the Lucene expressions module is limited to single-valued fields whereas
an approach along the lines described in this issue, such as in my last comment specifically,
would support multi-valued spatial fields because it decodes the actual terms during its execution
and can thus reference the same doc from multiple terms/points.

> Add a scoring DistanceQuery that does not need caches and separate filters
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-2395
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2395
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spatial
>            Reporter: Uwe Schindler
>         Attachments: ASF.LICENSE.NOT.GRANTED--DistanceQuery.java, ASF.LICENSE.NOT.GRANTED--DistanceQuery.java
>
>
> In a chat with Chris Male and my own ideas when implementing for PANGAEA, I thought about
the broken distance query in contrib. It lacks the following features:
> - It needs a query/filter for the enclosing bbox (which is constant score)
> - It needs a separate filter for filtering out hits to far away (inside bbox but outside
distance limit)
> - It has no scoring, so if somebody wants to sort by distance, he needs to use the custom
sort. For that to work, spatial caches distance calculation (which is broken for multi-segment
search)
> The idea is now to combine all three things into one query, but customizeable:
> We first thought about extending CustomScoreQuery and calculate the distance from FieldCache
in the customScore method and return a score of 1 for distance=0, score=0 on the max distance
and score<0 for farer hits, that are in the bounding box but not in the distance circle.
To filter out such negative scores, we would need to override the scorer in CustomScoreQuery
which is priate.
> My proposal is now to use a very stripped down CustomScoreQuery (but not extend it) that
does call a method getDistance(docId) in its scorer's advance and nextDoc that calculates
the distance for the current doc. It stores this distance also in the scorer. If the distance
> maxDistance it throws away the hit and calls nextDoc() again. The score() method will
reurn per default weight.value*(maxDistance - distance)/maxDistance and uses the precalculated
distance. So the distance is only calculated one time in nextDoc()/advance().
> To be able to plug in custom scoring, the following methods in the query can be overridden:
> - float getDistanceScore(double distance) - returns per default: (maxDistance - distance)/maxDistance;
allows score customization
> - DocIdSet getBoundingBoxDocIdSet(Reader, LatLng sw, LatLng ne) - returns an DocIdSet
for the bounding box. Per default it returns e.g. the docIdSet of a NRF or a cartesian tier
filter. You can even plug in any other DocIdSet, e.g. wrap a Query with QueryWrapperFilter
> - support a setter for the GeoDistanceCalculator that is used by the scorer to get the
distance.
> - a LatLng provider (similar to CustomScoreProvider/ValueSource) that returns for a given
doc id the lat/lng. This method is called per IndexReader one time in scorer creation and
will retrieve the coordinates. By that we support FieldCache or whatever.
> This query is almost finished in my head, it just needs coding :-)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message