spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Musselman <andrew.mussel...@gmail.com>
Subject Re: Joining data using Latitude, Longitude
Date Wed, 11 Mar 2015 09:55:13 GMT
Ted Dunning and Ellen Friedman's "Time Series Databases" has a section on
this with some approaches to geo-encoding:

https://www.mapr.com/time-series-databases-new-ways-store-and-access-data
http://info.mapr.com/rs/mapr/images/Time_Series_Databases.pdf

On Tue, Mar 10, 2015 at 3:53 PM, John Meehan <jnmeehan@gmail.com> wrote:

> There are some techniques you can use If you geohash
> <http://en.wikipedia.org/wiki/Geohash> the lat-lngs.  They will naturally
> be sorted by proximity (with some edge cases so watch out).  If you go the
> join route, either by trimming the lat-lngs or geohashing them, you’re
> essentially grouping nearby locations into buckets — but you have to
> consider the borders of the buckets since the nearest location may actually
> be in an adjacent bucket.  Here’s a paper that discusses an implementation:
> http://www.gdeepak.com/thesisme/Finding%20Nearest%20Location%20with%20open%20box%20query.pdf
>
> On Mar 9, 2015, at 11:42 PM, Akhil Das <akhil@sigmoidanalytics.com> wrote:
>
> Are you using SparkSQL for the join? In that case I'm not quiet sure you
> have a lot of options to join on the nearest co-ordinate. If you are using
> the normal Spark code (by creating key-pair on lat,lon) you can apply
> certain logic like trimming the lat,lon etc. If you want more specific
> computing then you are better off using haversine formula.
> <http://www.movable-type.co.uk/scripts/latlong.html>
>
>
>

Mime
View raw message