hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tim robertson <timrobertson...@gmail.com>
Subject Re: Spatial Databases on HBase (or Hadoop)
Date Fri, 19 Jun 2009 19:43:20 GMT
Hi Fred,

I was working on 150million point records, and 150,000 fairly detailed
polygons.  I had to batch it up and do 40,000 polygons in memory at a
time on the MapReduce jobs.

If you are dealing with a whole bunch of points, might it be worth
clustering them into polygons first to get candidate points?
We are running this:
http://code.flickr.com/blog/2008/10/30/the-shape-of-alpha/ and
clustering 1 million points into multipolygons in 5 seconds.  This
might get the numbers down to a sensible number.

It is a problem of great interest to us also, so happy to discuss
ideas... http://biodivertido.blogspot.com/2008/11/reproducing-spatial-joins-using-hadoop.html
was one of my early tests.

Cheers

Tim


On Fri, Jun 19, 2009 at 9:37 PM, Fred Zappert<fzappert@gmail.com> wrote:
> Tim,
>
> Thanks. That suggests an implementation that could be very effective at the
> current scale.
>
> Regards,
>
> Fred.
>
> On Fri, Jun 19, 2009 at 2:27 PM, tim robertson <timrobertson100@gmail.com>wrote:
>
>> I've used it as a source for a bunch of point data, and then tested
>> them in polygons with a contains().  I ended up loading the polygons
>> into memory with an RTree index though using the GeoTools libraries.
>>
>> Cheers
>>
>> Tim
>>
>>
>> On Fri, Jun 19, 2009 at 9:22 PM, Fred Zappert<fzappert@gmail.com> wrote:
>> > Hi,
>> >
>> > I would like to know if anyone is using HBase for spatial databases.
>> >
>> > The requirements are relatively simple.
>> >
>> > 1. Two dimensions.
>> > 2. Each object represented as a point.
>> > 3. Basic query is nearest neighbor, with a few qualifications such as:
>> > a
>> >
>>
>

Mime
View raw message