lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Created: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters
Date Thu, 15 Apr 2010 11:08:48 GMT
Add a scoring DistanceQuery that does not need caches and separate filters
--------------------------------------------------------------------------

                 Key: LUCENE-2395
                 URL: https://issues.apache.org/jira/browse/LUCENE-2395
             Project: Lucene - Java
          Issue Type: Improvement
          Components: contrib/spatial
            Reporter: Uwe Schindler
             Fix For: 3.1


In a chat with Chris Male and my own ideas when implemnting for PANGAEA, I thought about the
broken distance query in contrib. It lacks the folloing features:
- It needs a query for the encldoing bbox (which is constant score)
- It needs a separate filter for filtering out distances
- It has no scoring, so if somebody wants to sort by distance, he needs to use the custom
sort. For that to work, spatial caches distance calculation (which is borken for multi-segment
search)

The idea is now to combine all three things into one query, but customizeable:

We first thought about extending CustomScoreQuery and calculate the distance from FieldCache
in the customScore method and return a score of 1 for distance=0, score=0 on the max distance
and score<0 for farer hits, that are in the bounding box but not in the distance circle.
To filter out such negative scores, we would need to override the scorer in CustomScoreQuery
which is priate.

My proposal is now to use a very stripped down CustomScoreQuery (but not extend it) that does
call a method getDistance(docId) in its scorer's advance and nextDoc that calculates the distance
for the current doc. It stores this distance also in the scorer. If the distance > maxDistance
it throws away the hit and calls nextDoc() again. The score() method will reurn per default
weight.value*(maxDistance - distance)/maxDistance and uses the precalculated distance. So
the distance is only calculated one time in nextDoc()/advance().

To be able to plug in custom scoring, the following methods in the query can be overridden:
- float getDistanceScore(double distance) - returns per default: (maxDistance - distance)/maxDistance;
allows score customization
- DocIdSet getBoundingBoxDocIdSet(Reader, LatLng sw, LatLng ne) - returns an DocIdSet for
the bounding box. Per default it returns e.g. the docIdSet of a NRF or a cartesian tier filter.
You can even plug in any other DocIdSet, e.g. wrap a Query with QueryWrapperFilter
- support a setter for the GeoDistanceCalculator that is used by the scorer to get the distance.

This query is almost finished in my head, it just needs coding :-)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message