accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: Map Lexicoder
Date Wed, 30 Dec 2015 00:16:35 GMT
Do you want all maps with the same keys to sort together?  If so doing
abc123 would make that happen.

The map data below

  { a : 1 , b : 2, c : 3 }
  { a : 1 , x : 8, y : 9 }
  { a : 4 , b : 5, c : 6 }

Would sort like the following if encoding key values pairs

  a1b2c3
  a1x8y9
  a4b5c6

If encoding all key and then all values, it would sort as follows

  abc123
  abc456
  axy189



On Tue, Dec 29, 2015 at 4:53 PM, Adam J. Shook <adamjshook@gmail.com> wrote:

> Agreed, I came to the same conclusion while implementing.  The final
> result that I have is a SortedMapLexicoder to avoid any comparisons going
> haywire.  Additionally, would it be best to encode the map as an array of
> keys followed by an array of values, or encode all key value pairs
> back-to-back:
>
> { a : 1 , b : 2, c : 3 } encoded as
>
> a1b2c3
> -or-
> abc123
>
> Feels like I should be encoding a list of keys, then the list of values,
> and then concatenating these two encoded byte arrays.  I think the end
> solution will be to support both?  I'm having a hard time reconciling which
> method is better, if any.  Hard to find some good examples of people who
> are sorting a list of maps.
>
> On Tue, Dec 29, 2015 at 2:47 PM, Keith Turner <keith@deenlo.com> wrote:
>
>>
>>
>> On Mon, Dec 28, 2015 at 11:47 AM, Adam J. Shook <adamjshook@gmail.com>
>> wrote:
>>
>>> Hello all,
>>>
>>> Any suggestions for using a Map Lexicoder (or implementing one)?  I am
>>> currently using a new ListLexicoder(new PairLexicoder(some lexicoder, some
>>> lexicoder), which is working for single maps.  However, when one of the
>>> lexicoders in the Pair is itself a Map (and therefore another
>>> ListLexicoder(PairLexicoder)), an exception is being thrown because
>>> ArrayList is not Comparable.
>>>
>>
>>
>> Since maps do not have a well defined order of keys and values,
>> comparison is tricky.   The purpose of Lexicoders is encode things in such
>> a way that the lexicographical comparison of the serialized data is
>> possible.  With a hashmap if I add the same data in the same order to two
>> different hash map instances, its possible that when iterating over those
>> maps I could see the data in different orders.   This could lead to two
>> maps constructed in the same way at different times (like different JVMs
>> with different implementations of HashMap) generating different data that
>> compare as different.  Ideally comparison of the two would yield equality.
>>
>> Something like LinkedHashMap does not have this problem for the same
>> insertion order.  If you want things to be comparable regardless of
>> insertion order (which I think is more intuitive), then SortedMap seems
>> like it would be a good candidate.  So maybe a SortedMapLexicoder would be
>> a better thing to offer?
>>
>>
>>> Regards,
>>> --Adam
>>>
>>
>>
>

Mime
View raw message