A couple of related issues come up when considering implementing a
dimensionality reducing encoding  just want to toss those out to see what
people think the interface might look like.
There's a couple of aspects that could be brought in here, but lets keep it
simple and considering the original question: (lat/lon) > number.
Desired precision of the binning process
The more bits we add to the zcurve, the more precise our comparison  i.e.
a 63 bit key would have more "locations" to sort by than a 24 bit key.
Would you see a reasonable default getting picked, make this user
configurable, or both? (i.e. default to a value, extended options with a
new constructor?)
Semantics for turning two lat/long pairs into a range
I'm extrapolating here, but the only reason I see that locality matters is
if we want to preserve locality for range searches. The internal
implementation of the encoding/lexicoding process is going to directly
impact the implementation of the range query.
Now sure, someone could encode the lower left point, encode the upper right
point, and construct a range out of that to pass for a scan, but that's
going to be wildly inefficient in most cases. See:
https://dl.dropboxusercontent.com/u/6649380/bbox.png
If we just lexicode the lower left and upper right we traverse across the
entire curve  hitting lots of areas that aren't actually in the original
range.
Now we can turn a single 2D range into a set of 1D ranges. There is some
potential tuning here now, as the algorithm has a tradeoff on time to
compute the ranges (and number of ranges) vs. "slop" (or inclusion of
ranges which aren't actually in the original query).
Would you see a static method perhaps on the zcurve lexicoder that returns
a series of ranges based on an input window? Some other mechanism?
And in the case of "slop"  would we just document that the ranges could
actually include values not expected  or would we always fully decompose?

The other, "making it more complicated" questions would probably resolve
around generalizing this to a multidimensional lexicoder. I would expect
the WGS84 (lat/long) encoder would just be a 2D instance with 180/+180
90/+90 bounds. There are probably completely different cases where it
would be useful to have a locality sensitive hash.
But with the big caveat that this requires more methods  I now need a way
of defining a range for each of the dimensions (which before were defined
as lat/lon), and I need an interface/class to pass those dimensions to the
encoder  and should be able to use those same definitions to decode my
values on the way out.
I'm not trying to make a mountain out of a molehill  one could definitely
pick some sane defaults, document behaviour, and put in a WGS84 specific
implementation. I'm just thinking what's the minimum viable bit that's
needed for this to be useful  and I suspect it's also the range
decomposition piece  as I suspect *everyone* would really need that (if
they cared about locality in the first place).
My main question here would be to what extent would you see this going? A
one off for WGS84, or a more generic multidimensional lexicoder; and in
either case, what would the thoughts/consensus be for implementation of the
additional methods (and possibly classes) required?
On Fri, Jul 25, 2014 at 12:37 PM, Sean Busbey <busbey@cloudera.com> wrote:
>
> On Fri, Jul 25, 2014 at 8:45 AM, Anthony F <afccri@gmail.com> wrote:
>
>>
>> GeoMesa plans to update to 1.6 as soon as we get past a 1.0 RELEASE
>> (which is waiting on approval by the Eclipse Foundation's IP review process
>> and should happen in a month or so). I think we could break out a zcurve
>> lexicoder and contribute before then though. I'll poke around at the
>> lexicoder stuff in 1.6 and see what it would take to use that interface now.
>>
>>
>
> That sounds great Anthony. Let us know if we can help with anything!
>
>
>
> 
> Sean
>
