hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michel Segel <michael_se...@hotmail.com>
Subject Re: HBase Types: Explicit Null Support
Date Tue, 02 Apr 2013 02:40:03 GMT
Silly question...
Null support. In a system where a column may or may not exist, how do you support null?

;-)

In terms of a key,  it's a primary key and can't be null.  


So what am I missing?


Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 1, 2013, at 10:26 PM, Nick Dimiduk <ndimiduk@gmail.com> wrote:

> Furthermore, is is more important to support null values than squeeze all
> representations into minimum size (4-bytes for int32, &c.)?
> On Apr 1, 2013 4:41 PM, "Nick Dimiduk" <ndimiduk@gmail.com> wrote:
> 
>> On Mon, Apr 1, 2013 at 4:31 PM, James Taylor <jtaylor@salesforce.com>wrote:
>> 
>>> From the SQL perspective, handling null is important.
>> 
>> 
>> From your perspective, it is critical to support NULLs, even at the
>> expense of fixed-width encodings at all or supporting representation of a
>> full range of values. That is, you'd rather be able to represent NULL than
>> -2^31?
>> 
>> On 04/01/2013 01:32 PM, Nick Dimiduk wrote:
>>> 
>>>> Thanks for the thoughtful response (and code!).
>>>> 
>>>> I'm thinking I will press forward with a base implementation that does
>>>> not
>>>> support nulls. The idea is to provide an extensible set of interfaces,
>>>> so I
>>>> think this will not box us into a corner later. That is, a mirroring
>>>> package could be implemented that supports null values and accepts
>>>> the relevant trade-offs.
>>>> 
>>>> Thanks,
>>>> Nick
>>>> 
>>>> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <mcorgan@hotpads.com>
>>>> wrote:
>>>> 
>>>> I spent some time this weekend extracting bits of our serialization
>>>>> code to
>>>>> a public github repo at http://github.com/hotpads/**data-tools<http://github.com/hotpads/data-tools>
>>>>> .
>>>>>  Contributions are welcome - i'm sure we all have this stuff laying
>>>>> around.
>>>>> 
>>>>> You can see I've bumped into the NULL problem in a few places:
>>>>> *
>>>>> 
>>>>> https://github.com/hotpads/**data-tools/blob/master/src/**
>>>>> main/java/com/hotpads/data/**primitive/lists/LongArrayList.**java<https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java>
>>>>> *
>>>>> 
>>>>> https://github.com/hotpads/**data-tools/blob/master/src/**
>>>>> main/java/com/hotpads/data/**types/floats/DoubleByteTool.**java<https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java>
>>>>> 
>>>>> Looking back, I think my latest opinion on the topic is to reject
>>>>> nullability as the rule since it can cause unexpected behavior and
>>>>> confusion.  It's cleaner to provide a wrapper class (so both
>>>>> LongArrayList
>>>>> plus NullableLongArrayList) that explicitly defines the behavior, and
>>>>> costs
>>>>> a little more in performance.  If the user can't find a pre-made wrapper
>>>>> class, it's not very difficult for each user to provide their own
>>>>> interpretation of null and check for it themselves.
>>>>> 
>>>>> If you reject nullability, the question becomes what to do in situations
>>>>> where you're implementing existing interfaces that accept nullable
>>>>> params.
>>>>>  The LongArrayList above implements List<Long> which requires an
>>>>> add(Long)
>>>>> method.  In the above implementation I chose to swap nulls with
>>>>> Long.MIN_VALUE, however I'm now thinking it best to force the user to
>>>>> make
>>>>> that swap and then throw IllegalArgumentException if they pass null.
>>>>> 
>>>>> 
>>>>> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <
>>>>> doug.meil@explorysmedical.com
>>>>> 
>>>>>> wrote:
>>>>>> HmmmÅ  good question.
>>>>>> 
>>>>>> I think that fixed width support is important for a great many rowkey
>>>>>> constructs cases, so I'd rather see something like losing MIN_VALUE
and
>>>>>> keeping fixed width.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 4/1/13 2:00 PM, "Nick Dimiduk" <ndimiduk@gmail.com> wrote:
>>>>>> 
>>>>>> Heya,
>>>>>>> 
>>>>>>> Thinking about data types and serialization. I think null support
is
>>>>>>> an
>>>>>>> important characteristic for the serialized representations,
>>>>>>> especially
>>>>>>> when considering the compound type. However, doing so in directly
>>>>>>> incompatible with fixed-width representations for numerics. For
>>>>>> instance,
>>>>> 
>>>>>> if we want to have a fixed-width signed long stored on 8-bytes, where
>>>>>>> do
>>>>>>> you put null? float and double types can cheat a little by folding
>>>>>>> negative
>>>>>>> and positive NaN's into a single representation (this isn't strictly
>>>>>>> correct!), leaving a place to represent null. In the long example
>>>>>>> case,
>>>>>>> the
>>>>>>> obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by
one.
>>>>>>> This
>>>>>>> will allocate an additional encoding which can be used for null.
My
>>>>>>> experience working with scientific data, however, makes me wince
at
>>>>>>> the
>>>>>>> idea.
>>>>>>> 
>>>>>>> The variable-width encodings have it a little easier. There's
already
>>>>>>> enough going on that it's simpler to make room.
>>>>>>> 
>>>>>>> Remember, the final goal is to support order-preserving serialization.
>>>>>>> This
>>>>>>> imposes some limitations on our encoding strategies. For instance,
>>>>>>> it's
>>>>>>> not
>>>>>>> enough to simply encode null, it really needs to be encoded as
0x00 so
>>>>>> as
>>>>> 
>>>>>> to sort lexicographically earlier than any other value.
>>>>>>> 
>>>>>>> What do you think? Any ideas, experiences, etc?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Nick
>> 

Mime
View raw message