accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam J. Shook" <adamjsh...@gmail.com>
Subject Re: Map Lexicoder
Date Wed, 30 Dec 2015 16:41:51 GMT
I personally think all maps with the same keys should be sorted together,
but I think it'd be best to support and document both ways and leave that
up to the user.  I'm sure either way could be argued, and this is certainly
an edge case for lexicoders.

On Tue, Dec 29, 2015 at 7:16 PM, Keith Turner <keith@deenlo.com> wrote:

> Do you want all maps with the same keys to sort together?  If so doing
> abc123 would make that happen.
>
> The map data below
>
>   { a : 1 , b : 2, c : 3 }
>   { a : 1 , x : 8, y : 9 }
>   { a : 4 , b : 5, c : 6 }
>
> Would sort like the following if encoding key values pairs
>
>   a1b2c3
>   a1x8y9
>   a4b5c6
>
> If encoding all key and then all values, it would sort as follows
>
>   abc123
>   abc456
>   axy189
>
>
>
> On Tue, Dec 29, 2015 at 4:53 PM, Adam J. Shook <adamjshook@gmail.com>
> wrote:
>
>> Agreed, I came to the same conclusion while implementing.  The final
>> result that I have is a SortedMapLexicoder to avoid any comparisons going
>> haywire.  Additionally, would it be best to encode the map as an array of
>> keys followed by an array of values, or encode all key value pairs
>> back-to-back:
>>
>> { a : 1 , b : 2, c : 3 } encoded as
>>
>> a1b2c3
>> -or-
>> abc123
>>
>> Feels like I should be encoding a list of keys, then the list of values,
>> and then concatenating these two encoded byte arrays.  I think the end
>> solution will be to support both?  I'm having a hard time reconciling which
>> method is better, if any.  Hard to find some good examples of people who
>> are sorting a list of maps.
>>
>> On Tue, Dec 29, 2015 at 2:47 PM, Keith Turner <keith@deenlo.com> wrote:
>>
>>>
>>>
>>> On Mon, Dec 28, 2015 at 11:47 AM, Adam J. Shook <adamjshook@gmail.com>
>>> wrote:
>>>
>>>> Hello all,
>>>>
>>>> Any suggestions for using a Map Lexicoder (or implementing one)?  I am
>>>> currently using a new ListLexicoder(new PairLexicoder(some lexicoder, some
>>>> lexicoder), which is working for single maps.  However, when one of the
>>>> lexicoders in the Pair is itself a Map (and therefore another
>>>> ListLexicoder(PairLexicoder)), an exception is being thrown because
>>>> ArrayList is not Comparable.
>>>>
>>>
>>>
>>> Since maps do not have a well defined order of keys and values,
>>> comparison is tricky.   The purpose of Lexicoders is encode things in such
>>> a way that the lexicographical comparison of the serialized data is
>>> possible.  With a hashmap if I add the same data in the same order to two
>>> different hash map instances, its possible that when iterating over those
>>> maps I could see the data in different orders.   This could lead to two
>>> maps constructed in the same way at different times (like different JVMs
>>> with different implementations of HashMap) generating different data that
>>> compare as different.  Ideally comparison of the two would yield equality.
>>>
>>> Something like LinkedHashMap does not have this problem for the same
>>> insertion order.  If you want things to be comparable regardless of
>>> insertion order (which I think is more intuitive), then SortedMap seems
>>> like it would be a good candidate.  So maybe a SortedMapLexicoder would be
>>> a better thing to offer?
>>>
>>>
>>>> Regards,
>>>> --Adam
>>>>
>>>
>>>
>>
>

Mime
View raw message