lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-7069) Add LatLonPoint.nearest to find closest indexed point to a given query point
Date Thu, 14 Apr 2016 09:43:25 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless updated LUCENE-7069:
---------------------------------------
    Attachment: LUCENE-7069.patch

Here's another patch ... I think it's nearly ready.

I added "nearest neighbor" to the luceneutil London, UK geo benchmark, and getting top 10
nearest hits from the center of each of the 225 boxes that benchmark tests is extremely fast
... less than 0.5 msec for each call ... KD trees are very good for this ;)

This adds {{LatLonPoint.nearest}}, and it now takes an {{int topN}} (vs. single nearest point
in previous patches).

I think it solves the adversary issues of the previous patches, at least the ones that are
solvable.  Some are not!  Imagine the index only contains many points indexed from a circle,
and then you ask for nearest from its center ... no matter your algorithm, that will amount
to a linear scan.

There's one wrinkle though: the search works directly with {{BKDReader}}, because it's a best
first search across all segments, picking which cell (across all segments) is the most "promising"
to descend into next.  This means it requires that you use {{Lucene60PointsFormat}} (the default
codec), and it gets the underlying {{BKDReader}} from that.  I think making this work only
through the codec APIs is ... tricky.

There are further optimizations we could do like using {{haversinSortKey}}, or maybe fixing
the {{approxBestDistance}} to be a true best distance (I need heavy geo math help for that!),
and using Lucene's priority queue not the JDK's, but these all can come later.


> Add LatLonPoint.nearest to find closest indexed point to a given query point
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-7069
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7069
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: master, 6.1
>
>         Attachments: LUCENE-7069.patch, LUCENE-7069.patch, LUCENE-7069.patch
>
>
> KD trees (used by Lucene's new dimensional points) excel at finding "nearest neighbors"
to a given query point ... I think we should add this to Lucene's sandbox as:
> {noformat}
>   public static Document nearest(IndexReader r, String field, double lat, double lon)
throws IOException
> {noformat}
> I only implemented the 1 nearest neighbor for starters ... I think we can easily generalize
this in the future to K nearest.
> It could also be generalized to more than 2 dimensions, but for now I'm making the class
package private in sandbox for just the geo2d (lat/lon) use case.
> I don't think this should go into 6.0.0, but should go into 6.1: it's a new feature,
and we need to wrap up and ship 6.0.0 already ;)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message