hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jtay...@salesforce.com>
Subject Re: HBase Types: Explicit Null Support
Date Mon, 01 Apr 2013 23:31:23 GMT
 From the SQL perspective, handling null is important. Phoenix supports 
null in the following way:
- the absence of a key value
- an empty value in a key value
- an empty value in a multi part row key
   - for variable length types (VARCHAR and DECIMAL) a null byte 
separator would be used if not the last column
   - for fixed width types only the last column is allowed to be null

As you mentioned, it's important to maintain the lexicographical sort 
order with nulls being first.

On 04/01/2013 01:32 PM, Nick Dimiduk wrote:
> Thanks for the thoughtful response (and code!).
> I'm thinking I will press forward with a base implementation that does not
> support nulls. The idea is to provide an extensible set of interfaces, so I
> think this will not box us into a corner later. That is, a mirroring
> package could be implemented that supports null values and accepts
> the relevant trade-offs.
> Thanks,
> Nick
> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <mcorgan@hotpads.com> wrote:
>> I spent some time this weekend extracting bits of our serialization code to
>> a public github repo at http://github.com/hotpads/data-tools.
>>   Contributions are welcome - i'm sure we all have this stuff laying around.
>> You can see I've bumped into the NULL problem in a few places:
>> *
>> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java
>> *
>> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java
>> Looking back, I think my latest opinion on the topic is to reject
>> nullability as the rule since it can cause unexpected behavior and
>> confusion.  It's cleaner to provide a wrapper class (so both LongArrayList
>> plus NullableLongArrayList) that explicitly defines the behavior, and costs
>> a little more in performance.  If the user can't find a pre-made wrapper
>> class, it's not very difficult for each user to provide their own
>> interpretation of null and check for it themselves.
>> If you reject nullability, the question becomes what to do in situations
>> where you're implementing existing interfaces that accept nullable params.
>>   The LongArrayList above implements List<Long> which requires an add(Long)
>> method.  In the above implementation I chose to swap nulls with
>> Long.MIN_VALUE, however I'm now thinking it best to force the user to make
>> that swap and then throw IllegalArgumentException if they pass null.
>> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <doug.meil@explorysmedical.com
>>> wrote:
>>> HmmmÅ  good question.
>>> I think that fixed width support is important for a great many rowkey
>>> constructs cases, so I'd rather see something like losing MIN_VALUE and
>>> keeping fixed width.
>>> On 4/1/13 2:00 PM, "Nick Dimiduk" <ndimiduk@gmail.com> wrote:
>>>> Heya,
>>>> Thinking about data types and serialization. I think null support is an
>>>> important characteristic for the serialized representations, especially
>>>> when considering the compound type. However, doing so in directly
>>>> incompatible with fixed-width representations for numerics. For
>> instance,
>>>> if we want to have a fixed-width signed long stored on 8-bytes, where do
>>>> you put null? float and double types can cheat a little by folding
>>>> negative
>>>> and positive NaN's into a single representation (this isn't strictly
>>>> correct!), leaving a place to represent null. In the long example case,
>>>> the
>>>> obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one. This
>>>> will allocate an additional encoding which can be used for null. My
>>>> experience working with scientific data, however, makes me wince at the
>>>> idea.
>>>> The variable-width encodings have it a little easier. There's already
>>>> enough going on that it's simpler to make room.
>>>> Remember, the final goal is to support order-preserving serialization.
>>>> This
>>>> imposes some limitations on our encoding strategies. For instance, it's
>>>> not
>>>> enough to simply encode null, it really needs to be encoded as 0x00 so
>> as
>>>> to sort lexicographically earlier than any other value.
>>>> What do you think? Any ideas, experiences, etc?
>>>> Thanks,
>>>> Nick

View raw message