Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 44193 invoked from network); 21 Jun 2009 00:52:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 21 Jun 2009 00:52:58 -0000 Received: (qmail 62775 invoked by uid 500); 21 Jun 2009 00:53:09 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 62704 invoked by uid 500); 21 Jun 2009 00:53:08 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 62694 invoked by uid 99); 21 Jun 2009 00:53:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Jun 2009 00:53:08 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of fzappert@gmail.com designates 74.125.92.24 as permitted sender) Received: from [74.125.92.24] (HELO qw-out-2122.google.com) (74.125.92.24) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Jun 2009 00:52:55 +0000 Received: by qw-out-2122.google.com with SMTP id 3so1245466qwe.35 for ; Sat, 20 Jun 2009 17:52:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=4DQKZHVxFMaY/IpEb4Jn/XZRt8vtwFJdO6x35W5kQVA=; b=GtTx+i2U2FQY4tUOgFtV9NRiUzgw65DH/Ee63kBi47V34P5GGB4K67g1bUnEPJ8qg4 /ZgHn+sXRmDKrGrYf6fWfAxG8eUixr2SNVKSVk0fxtDUnjoRIQ1fp0AMuWFyweRExiZU CCOY7dJRjGGS/7H57C1AtzdrRuXqZNdXHEoas= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=kSVl+D44LMx5AZC/HaB6QpOzFYXHHhtjs9molrDDwgKSQGgQKacmMkYm2lo9rQ7eed zpeYbJAFrNhH6p2dhsiroGjayKkzyiupjZRjBQq1Ul/FRfg9hDjm2l10hha3eSBtNaVV srEhQcFqOg2IboVRSBnHuq5RW79+g7mrto7OE= MIME-Version: 1.0 Received: by 10.229.110.20 with SMTP id l20mr835080qcp.60.1245545554503; Sat, 20 Jun 2009 17:52:34 -0700 (PDT) In-Reply-To: <32120a6a0906200445x130ba4d0qe5e23ca37a5053e@mail.gmail.com> References: <14048aeb0906191222t6b9f40abucc70025d2383dd44@mail.gmail.com> <32120a6a0906191227g69c03e34ud06ddf168ff42203@mail.gmail.com> <14048aeb0906191237j3d664bb3l72d8546d5de30d0b@mail.gmail.com> <32120a6a0906191243w2627a1f8n913625fa2207e2c5@mail.gmail.com> <14048aeb0906191316u77eab2fbi6ea129120890fa5b@mail.gmail.com> <32120a6a0906200445x130ba4d0qe5e23ca37a5053e@mail.gmail.com> Date: Sat, 20 Jun 2009 19:52:34 -0500 Message-ID: <14048aeb0906201752l639aa8d7i966bcfb608b027bd@mail.gmail.com> Subject: Re: Spatial Databases on HBase (or Hadoop) From: Fred Zappert To: hbase-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016364eca9e53eafa046cd12ce1 X-Virus-Checked: Checked by ClamAV on apache.org --0016364eca9e53eafa046cd12ce1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Tim, Thanks again for drilling into a very sound solution. This problem is further partitioned because the vehicles and way points belong to fleets, and there are several thousand fleets being tracked. I need to look into the current implementation to see if there is any prediction going on, because the current reporting intervals for the vehicles is 15 minutes. However, part of the architecture we're developing is intended to deal with many more vehicles, and a reporting interval of several times/minute. I would also expect that there are many way points that are common, such as weigh stations, and loading docks that are serviced by multiple fleets. I'm new to the map-reduce paradigm, and this is a great example of its utility. Most of the GIS databases are extensions to traditional databases (Oracle, Postgres, and MySQL), and it's nice to see how those are not needed, at least for this application. Regards, Fred. On Sat, Jun 20, 2009 at 6:45 AM, tim robertson wrote: > Hi Fred, > > So I am guessing then your "real time" calculations are all going to > be focused about the moving vehicles right? > If the way-points are relatively static you can preprocess information > about those offline (distance between each, data mining average time > taken to travel between 2 etc). > > So I am guessing you would need to find way-points relative to a given > vehicle - if this is the case, I think you are going to need to > investigate some kind of index for the way-points. We do this for our > 150 million points by putting them in an identified 1 degree x 1 > degree cell (and then 0.1 x 0.1 degree cells), so that if someone is > interested in points near a location, we first determine which cells > are candidates and immediately we have reduced the candidate points to > check. > > In database terms, we have latitude, longitude and then create a > (cell_id int, centi_cell_id int). > > If you know the routes that a vehicle is taking, is there any way you > could preplan it's route perhaps and cache that, or store somehow > known routes between way-points? This might allow you to really > reduce the candidates to check. > > Just some ideas > > Tim > skype: timrobertson100 > > > > > > On Fri, Jun 19, 2009 at 10:16 PM, Fred Zappert wrote: > > Tim, > > > > Thanks so much for the additional links. > > > > Our problem is for the moment much smaller - 4,000,000 mapped way-points, > > and 80,000 moving vehicles. > > > > Clustering the way-points into polygons makes a lot of sense. > > > > Fred. > > > > On Fri, Jun 19, 2009 at 2:43 PM, tim robertson < > timrobertson100@gmail.com>wrote: > > > >> Hi Fred, > >> > >> I was working on 150million point records, and 150,000 fairly detailed > >> polygons. I had to batch it up and do 40,000 polygons in memory at a > >> time on the MapReduce jobs. > >> > >> If you are dealing with a whole bunch of points, might it be worth > >> clustering them into polygons first to get candidate points? > >> We are running this: > >> http://code.flickr.com/blog/2008/10/30/the-shape-of-alpha/ and > >> clustering 1 million points into multipolygons in 5 seconds. This > >> might get the numbers down to a sensible number. > >> > >> It is a problem of great interest to us also, so happy to discuss > >> ideas... > >> > http://biodivertido.blogspot.com/2008/11/reproducing-spatial-joins-using-hadoop.html > >> was one of my early tests. > >> > >> Cheers > >> > >> Tim > >> > >> > >> On Fri, Jun 19, 2009 at 9:37 PM, Fred Zappert > wrote: > >> > Tim, > >> > > >> > Thanks. That suggests an implementation that could be very effective > at > >> the > >> > current scale. > >> > > >> > Regards, > >> > > >> > Fred. > >> > > >> > On Fri, Jun 19, 2009 at 2:27 PM, tim robertson < > >> timrobertson100@gmail.com>wrote: > >> > > >> >> I've used it as a source for a bunch of point data, and then tested > >> >> them in polygons with a contains(). I ended up loading the polygons > >> >> into memory with an RTree index though using the GeoTools libraries. > >> >> > >> >> Cheers > >> >> > >> >> Tim > >> >> > >> >> > >> >> On Fri, Jun 19, 2009 at 9:22 PM, Fred Zappert > >> wrote: > >> >> > Hi, > >> >> > > >> >> > I would like to know if anyone is using HBase for spatial > databases. > >> >> > > >> >> > The requirements are relatively simple. > >> >> > > >> >> > 1. Two dimensions. > >> >> > 2. Each object represented as a point. > >> >> > 3. Basic query is nearest neighbor, with a few qualifications such > as: > >> >> > a > >> >> > > >> >> > >> > > >> > > > --0016364eca9e53eafa046cd12ce1--