spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russ Weeks <>
Subject Re: Indexing Support
Date Sun, 18 Oct 2015 23:10:10 GMT
Distributed R-Trees are not very common. Most "big data" spatial solutions
collapse multi-dimensional data into a distributed one-dimensional index
using a space-filling curve. Many implementations exist outside of Spark
for eg. Hbase or Accumulo. It's simple enough to write a map function that
takes a longitude+latitude pair and converts it to a position on a Z curve,
so you can work with a PairRDD of something like <Long,Feature>. It's more
complicated to convert a geospatial query expressed as a bounding box into
a set of disjoint curve intervals, but there are good examples out there.
The excellent Accumulo Recipes project has an implementation of such an
algorithm, it would be pretty easy to port it to work with a PairRDD as
described above.

On Sun, Oct 18, 2015 at 3:26 PM Jerry Lam <> wrote:

> I'm interested in it but I doubt there is r-tree indexing support in the
> near future as spark is not a database. You might have a better luck
> looking at databases with spatial indexing support out of the box.
> Cheers
> Sent from my iPad
> On 2015-10-18, at 17:16, Mustafa Elbehery <>
> wrote:
> Hi All,
> I am trying to use spark to process *Spatial Data. *I am looking for
> R-Tree Indexing support in best case, but I would be fine with any other
> indexing capability as well, just to improve performance.
> Anyone had the same issue before, and is there any information regarding
> Index support in future releases ?!!
> Regards.
> --
> Mustafa Elbehery
> EIT ICT Labs Master School <>
> +49(0)15750363097
> skype: mustafaelbehery87

View raw message