incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aanand Prasad <aanand.pra...@gmail.com>
Subject Re: geo coding, long/lats?
Date Fri, 19 Mar 2010 18:54:05 GMT
I've implemented a basic geospatial search against a Cassandra dataset by
keeping a column family of items indexed by geohash (
http://en.wikipedia.org/wiki/Geohash). Essentially, to search for items
within a given area, you calculate a geohash that covers the entire area
(but is still as specific as possible) and use it to do a prefix query (i.e.
range scan) on the index.

It has a weakness in that it fares badly in, for example, London, where your
search area may straddle the boundary between positive and negative
longitude. Hashes for points either side of this boundary (e.g.
http://geohash.org/u10hb7951 and http://geohash.org/gcpuzewfz) have no
prefix in common and so you'll end up doing a prefix scan on the empty
string, pulling in everything. The same problem will arise (albeit less
dramatically) in areas that contain points with only a short prefix in
common.

Off the top of my head, there are two things you can do about this. Firstly,
you can use a less concise geohash implementation that subdivides the world
less quickly, increasing your chances of finding a closely-fitting bounding
hash (i.e. common prefix) for any given area. Secondly, rather than find a
single bounding hash, you could generate multiple hashes that cover the
area, perform a prefix query for each of them and aggregate the results.
This blog post describes the latter approach in more detail, along with some
theoretical optimisations:

http://blog.notdot.net/2009/11/Damn-Cool-Algorithms-Spatial-indexing-with-Quadtrees-and-Hilbert-Curves

I also plan to investigate Local Lucene's approach, which uses Cartesian
tiers:

http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html



On Fri, Mar 19, 2010 at 1:20 PM, Brandon Williams <driftx@gmail.com> wrote:

> On Fri, Mar 19, 2010 at 9:06 AM, Joseph Stein <cryptcom@gmail.com> wrote:
>
>> Hi All, has anyone ever done geo coding to find distance based results
>> from storing long/lats with a starting long/lat and variable?
>>
>> This thread might be helpful:
>
> http://n2.nabble.com/Help-Wrap-My-Head-Around-Cassandra-td4657302.html
>
> -Brandon
>

Mime
View raw message