Hey, all,
I'm looking at switching my geospatial index to a partitioned index to
smooth out some hotspots. So for any query, I'll have a bunch of ranges
representing intervals on a Hilbert curve, plus a bunch of partitions, each
of which needs to be scanned for every range.
The way that the (excellent!) Accumulo Recipes geospatial store addresses
this is to take the product of the partitions and the curve intervals[1].
It seems like an alternative would be to encode the curve intervals as a
property of a custom iterator (I need one anyways to filter out extraneous
points from the search area) and then the client would just scan (inf,
+inf), which I think is more typical when querying a partitioned index?
Can anybody comment on which approach is preferred? Is it common to expose
the number of partitions in the index and the encoding of those partitions
to client code? Am I needlessly worried that taking the product of the
curve intervals and the partitions will produce too many ranges?
Thanks,
Russ
1:
https://github.com/calrissian/accumulorecipes/blob/master/store/geospatialstore/src/main/java/org/calrissian/accumulorecipes/geospatialstore/impl/AccumuloGeoSpatialStore.java#L190
