accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Map Lexicoder
Date Mon, 28 Dec 2015 22:39:51 GMT
Gotcha, thanks for the background.

I think as long as you can preserve the same level of compatibility with 
the other lexicoders, this would be a nice addition. If it's an itch you 
want to scratch, others probably will want to do the same too :)

Keith probably knows the most about what current works off the top of 
his head (since he wrote the Lexicoders, IIRC), but I imagine he's 
taking some time off work and isn't watch the list mailing list closely.

If you get stuck with how to implement this, let me know and I can try 
to poke around at the implementation too.

Adam J. Shook wrote:
> Hi Josh,
>
> Thanks for the advice.  I'm with you on using the CQ and Value instead
> of putting the whole map into a Value, but what I am working on is using
> the relational model of mapping data to Accumulo and expects the value
> of the cell to be in the Value.  Certainly some optimization
> opportunities by using the 'better' ways for storing data in Accumulo,
> but I'd like to get this working before diving into that rabbit hole.
>
> A brief look at the ListLexicoder encodes each element of the list using
> a sub-lexicoder and escapes each element (0x00 -> 0x01 0x01 and 0x01 ->
> 0x01 0x02).  The voodoo here escapes me a little (pun!), but it seems to
> be enough to enable multi-dimensional arrays encoded by nesting
> ListLexicoders (up to 4D, haven't tried a fifth dimension).  I would
> expect something similar could be done using a Map.  Would a
> MapLexicoder be something worth contributing to the project?  I'd be
> happy to give it a stab.
>
> --Adam
>
> On Mon, Dec 28, 2015 at 12:21 PM, Josh Elser <josh.elser@gmail.com
> <mailto:josh.elser@gmail.com>> wrote:
>
>     Looks like you would have to implement some kind of ComparableMap to
>     be able to use the PairLexicoder (see that the parameterization
>     requires both types in the Pair to implement Comparable). The Pair
>     lexicoder requires these Comparable types to align itself with the
>     original goal of the Lexicoders: provide byte-array serialization
>     for types whose sort order matches the original object's ordering.
>
>     Typically, when we have key to value style data we want to put in
>     Accumulo, it makes sense to leverage the Column Qualifier and the
>     Value, instead of serializing everything into one Accumulo Value.
>     Iterators make it easy to do server-side predicates and
>     transformations. My hunch is that this is another reason why you
>     don't already see a MapLexicoder provided.
>
>     One technical difficulty you might run into implementing a
>     generalized MapLexicoder is how you delimit the key and value in one
>     pair and how you delimit many pairs from each other. Commonly, the
>     "null" byte (\x00) is used as a separator since it doesn't often
>     appear in user-data. I'm not sure if some of the other Lexicoders
>     already use this in their serialization (e.g. the ListLexicoder
>     might, I haven't looked at the code). Nesting Lexicoders generically
>     might be tricky (although not impossible) -- thought it was worth
>     mentioning to make sure you thought about it.
>
>
>     Adam J. Shook wrote:
>
>         Hello all,
>
>         Any suggestions for using a Map Lexicoder (or implementing
>         one)?  I am
>         currently using a new ListLexicoder(new PairLexicoder(some
>         lexicoder,
>         some lexicoder), which is working for single maps.  However,
>         when one of
>         the lexicoders in the Pair is itself a Map (and therefore another
>         ListLexicoder(PairLexicoder)), an exception is being thrown because
>         ArrayList is not Comparable.
>
>         Regards,
>         --Adam
>
>

Mime
View raw message