Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of fzappert@gmail.com designates
 74.125.92.24 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=kSVl+D44LMx5AZC/HaB6QpOzFYXHHhtjs9molrDDwgKSQGgQKacmMkYm2lo9rQ7eed
         zpeYbJAFrNhH6p2dhsiroGjayKkzyiupjZRjBQq1Ul/FRfg9hDjm2l10hha3eSBtNaVV
         srEhQcFqOg2IboVRSBnHuq5RW79+g7mrto7OE=
MIME-Version: 1.0
In-Reply-To: <32120a6a0906200445x130ba4d0qe5e23ca37a5053e@mail.gmail.com>
References: <14048aeb0906191222t6b9f40abucc70025d2383dd44@mail.gmail.com>
	 <32120a6a0906191227g69c03e34ud06ddf168ff42203@mail.gmail.com>
	 <14048aeb0906191237j3d664bb3l72d8546d5de30d0b@mail.gmail.com>
	 <32120a6a0906191243w2627a1f8n913625fa2207e2c5@mail.gmail.com>
	 <14048aeb0906191316u77eab2fbi6ea129120890fa5b@mail.gmail.com>
	 <32120a6a0906200445x130ba4d0qe5e23ca37a5053e@mail.gmail.com>
Date: Sat, 20 Jun 2009 19:52:34 -0500
Message-ID: <14048aeb0906201752l639aa8d7i966bcfb608b027bd@mail.gmail.com>
Subject: Re: Spatial Databases on HBase (or Hadoop)
From: Fred Zappert <fzappert@gmail.com>
To: hbase-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=0016364eca9e53eafa046cd12ce1

--0016364eca9e53eafa046cd12ce1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Tim,

Thanks again for drilling into a very sound solution.

This problem is further partitioned because the vehicles and way points
belong to fleets, and there are several thousand fleets being tracked.

I need to look into the current implementation to see if there is any
prediction going on, because the current reporting intervals for the
vehicles is 15 minutes.

However, part of the architecture we're developing is intended to deal with
many more vehicles, and a reporting interval of several times/minute.

I would also expect that there are many way points that are common, such as
weigh stations, and loading docks that are serviced by multiple fleets.

I'm new to the map-reduce paradigm, and this is a great example of its
utility.  Most of the GIS databases are extensions to traditional databases
(Oracle, Postgres, and MySQL), and it's nice to see how those are not
needed, at least for this application.

Regards,

Fred.

On Sat, Jun 20, 2009 at 6:45 AM, tim robertson <timrobertson100@gmail.com>wrote:

> Hi Fred,
>
> So I am guessing then your "real time" calculations are all going to
> be focused about the moving vehicles right?
> If the way-points are relatively static you can preprocess information
> about those offline (distance between each, data mining average time
> taken to travel between 2  etc).
>
> So I am guessing you would need to find way-points relative to a given
> vehicle - if this is the case, I think you are going to need to
> investigate some kind of index for the way-points.  We do this for our
> 150 million points by putting them in an identified 1 degree x 1
> degree cell (and then 0.1 x 0.1 degree cells), so that if someone is
> interested in points near a location, we first determine which cells
> are candidates and immediately we have reduced the candidate points to
> check.
>
> In database terms, we have latitude, longitude and then create a
> (cell_id int, centi_cell_id int).
>
> If you know the routes that a vehicle is taking, is there any way you
> could preplan it's route perhaps and cache that, or store somehow
> known routes between way-points?  This might allow you to really
> reduce the candidates to check.
>
> Just some ideas
>
> Tim
> skype: timrobertson100
>
>
>
>
>
> On Fri, Jun 19, 2009 at 10:16 PM, Fred Zappert<fzappert@gmail.com> wrote:
> > Tim,
> >
> > Thanks so much for the additional links.
> >
> > Our problem is for the moment much smaller - 4,000,000 mapped way-points,
> > and 80,000 moving vehicles.
> >
> > Clustering the way-points into polygons makes a lot of sense.
> >
> > Fred.
> >
> > On Fri, Jun 19, 2009 at 2:43 PM, tim robertson <
> timrobertson100@gmail.com>wrote:
> >
> >> Hi Fred,
> >>
> >> I was working on 150million point records, and 150,000 fairly detailed
> >> polygons.  I had to batch it up and do 40,000 polygons in memory at a
> >> time on the MapReduce jobs.
> >>
> >> If you are dealing with a whole bunch of points, might it be worth
> >> clustering them into polygons first to get candidate points?
> >> We are running this:
> >> http://code.flickr.com/blog/2008/10/30/the-shape-of-alpha/ and
> >> clustering 1 million points into multipolygons in 5 seconds.  This
> >> might get the numbers down to a sensible number.
> >>
> >> It is a problem of great interest to us also, so happy to discuss
> >> ideas...
> >>
> http://biodivertido.blogspot.com/2008/11/reproducing-spatial-joins-using-hadoop.html
> >> was one of my early tests.
> >>
> >> Cheers
> >>
> >> Tim
> >>
> >>
> >> On Fri, Jun 19, 2009 at 9:37 PM, Fred Zappert<fzappert@gmail.com>
> wrote:
> >> > Tim,
> >> >
> >> > Thanks. That suggests an implementation that could be very effective
> at
> >> the
> >> > current scale.
> >> >
> >> > Regards,
> >> >
> >> > Fred.
> >> >
> >> > On Fri, Jun 19, 2009 at 2:27 PM, tim robertson <
> >> timrobertson100@gmail.com>wrote:
> >> >
> >> >> I've used it as a source for a bunch of point data, and then tested
> >> >> them in polygons with a contains().  I ended up loading the polygons
> >> >> into memory with an RTree index though using the GeoTools libraries.
> >> >>
> >> >> Cheers
> >> >>
> >> >> Tim
> >> >>
> >> >>
> >> >> On Fri, Jun 19, 2009 at 9:22 PM, Fred Zappert<fzappert@gmail.com>
> >> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > I would like to know if anyone is using HBase for spatial
> databases.
> >> >> >
> >> >> > The requirements are relatively simple.
> >> >> >
> >> >> > 1. Two dimensions.
> >> >> > 2. Each object represented as a point.
> >> >> > 3. Basic query is nearest neighbor, with a few qualifications such
> as:
> >> >> > a
> >> >> >
> >> >>
> >> >
> >>
> >
>

--0016364eca9e53eafa046cd12ce1--