Distributed R-Trees are not very common. Most "big data" spatial solutions collapse multi-dimensional data into a distributed one-dimensional index using a space-filling curve. Many implementations exist outside of Spark for eg. Hbase or Accumulo. It's simple enough to write a map function that takes a longitude+latitude pair and converts it to a position on a Z curve, so you can work with a PairRDD of something like <Long,Feature>. It's more complicated to convert a geospatial query expressed as a bounding box into a set of disjoint curve intervals, but there are good examples out there. The excellent Accumulo Recipes project has an implementation of such an algorithm, it would be pretty easy to port it to work with a PairRDD as described above.

On Sun, Oct 18, 2015 at 3:26 PM Jerry Lam <chilinglam@gmail.com> wrote:
I'm interested in it but I doubt there is r-tree indexing support in the near future as spark is not a database. You might have a better luck looking at databases with spatial indexing support out of the box. 


Sent from my iPad

On 2015-10-18, at 17:16, Mustafa Elbehery <elbeherymustafa@gmail.com> wrote:

Hi All, 

I am trying to use spark to process Spatial Data. I am looking for R-Tree Indexing support in best case, but I would be fine with any other indexing capability as well, just to improve performance. 

Anyone had the same issue before, and is there any information regarding Index support in future releases ?!!


Mustafa Elbehery
skype: mustafaelbehery87