hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jtay...@salesforce.com>
Subject Re: HBase Types: Explicit Null Support
Date Tue, 02 Apr 2013 00:39:33 GMT
Since SQL allows null valued composite key parts, we needed to support it.

On 04/01/2013 05:10 PM, Ted Yu wrote:
> bq. I create a dummy qualifier with a dummy value
>
> For any single application, the above can be done.
> For generic applications, how would we do this ?
>
> Thanks
>
>
> On Mon, Apr 1, 2013 at 5:07 PM, Matt Corgan <mcorgan@hotpads.com> wrote:
>
>> I generally don't allow nulls in my composite row keys.  Does SQL allow
>> nulls in the PK?  In the rare case I wanted to do that I might create a
>> separate format called NullableCInt32 with 5 bytes where the first one
>> determined null.  It's important to keep the pure types pure.
>>
>> I have lots of null *values* however, but they're represented by lack of a
>> qualifier in the Put.  If a row has all null values, I create a dummy
>> qualifier with a dummy value to make sure the row key gets inserted as it
>> would in sql.
>>
>>
>> On Mon, Apr 1, 2013 at 4:49 PM, James Taylor <jtaylor@salesforce.com>
>> wrote:
>>
>>> On 04/01/2013 04:41 PM, Nick Dimiduk wrote:
>>>
>>>> On Mon, Apr 1, 2013 at 4:31 PM, James Taylor <jtaylor@salesforce.com>
>>>> wrote:
>>>>
>>>>    From the SQL perspective, handling null is important.
>>>>   From your perspective, it is critical to support NULLs, even at the
>>>> expense
>>>> of fixed-width encodings at all or supporting representation of a full
>>>> range of values. That is, you'd rather be able to represent NULL than
>>>> -2^31?
>>>>
>>> We've been able to get away with supporting NULL through the absence of
>>> the value rather than restricting the data range. We haven't had any push
>>> back on not allowing a fixed width nullable leading row key column. Since
>>> our variable length DECIMAL supports null and is a superset of the fixed
>>> width numeric types, users have a reasonable alternative.
>>>
>>> I'd rather not restrict the range of values, since it doesn't seem like
>>> this would be necessary.
>>>
>>>
>>>> On 04/01/2013 01:32 PM, Nick Dimiduk wrote:
>>>>
>>>>> Thanks for the thoughtful response (and code!).
>>>>>> I'm thinking I will press forward with a base implementation that
does
>>>>>> not
>>>>>> support nulls. The idea is to provide an extensible set of interfaces,
>>>>>> so
>>>>>> I
>>>>>> think this will not box us into a corner later. That is, a mirroring
>>>>>> package could be implemented that supports null values and accepts
>>>>>> the relevant trade-offs.
>>>>>>
>>>>>> Thanks,
>>>>>> Nick
>>>>>>
>>>>>> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <mcorgan@hotpads.com>
>>>>>> wrote:
>>>>>>
>>>>>>    I spent some time this weekend extracting bits of our serialization
>>>>>> code
>>>>>>
>>>>>>> to
>>>>>>> a public github repo at http://github.com/hotpads/****data-tools<
>> http://github.com/hotpads/**data-tools>
>>>>>>> <http://github.com/**hotpads/data-tools<
>> http://github.com/hotpads/data-tools>
>>>>>>> .
>>>>>>>     Contributions are welcome - i'm sure we all have this stuff
laying
>>>>>>> around.
>>>>>>>
>>>>>>> You can see I've bumped into the NULL problem in a few places:
>>>>>>> *
>>>>>>>
>>>>>>> https://github.com/hotpads/****data-tools/blob/master/src/**<
>> https://github.com/hotpads/**data-tools/blob/master/src/**>
>> main/java/com/hotpads/data/****primitive/lists/LongArrayList.****java<
>>>>>>> https://github.com/**hotpads/data-tools/blob/**
>>>>>>> master/src/main/java/com/**hotpads/data/primitive/lists/**
>>>>>>> LongArrayList.java<
>> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java
>>>>>>> *
>>>>>>>
>>>>>>> https://github.com/hotpads/****data-tools/blob/master/src/**<
>> https://github.com/hotpads/**data-tools/blob/master/src/**>
>>>>>>> main/java/com/hotpads/data/****types/floats/DoubleByteTool.****java<
>>>>>>> https://github.com/**hotpads/data-tools/blob/**
>>>>>>> master/src/main/java/com/**hotpads/data/types/floats/**
>>>>>>> DoubleByteTool.java<
>> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java
>>>>>>> Looking back, I think my latest opinion on the topic is to reject
>>>>>>> nullability as the rule since it can cause unexpected behavior
and
>>>>>>> confusion.  It's cleaner to provide a wrapper class (so both
>>>>>>> LongArrayList
>>>>>>> plus NullableLongArrayList) that explicitly defines the behavior,
and
>>>>>>> costs
>>>>>>> a little more in performance.  If the user can't find a pre-made
>>>>>>> wrapper
>>>>>>> class, it's not very difficult for each user to provide their
own
>>>>>>> interpretation of null and check for it themselves.
>>>>>>>
>>>>>>> If you reject nullability, the question becomes what to do in
>>>>>>> situations
>>>>>>> where you're implementing existing interfaces that accept nullable
>>>>>>> params.
>>>>>>>     The LongArrayList above implements List<Long> which
requires an
>>>>>>> add(Long)
>>>>>>> method.  In the above implementation I chose to swap nulls with
>>>>>>> Long.MIN_VALUE, however I'm now thinking it best to force the
user to
>>>>>>> make
>>>>>>> that swap and then throw IllegalArgumentException if they pass
null.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <
>>>>>>> doug.meil@explorysmedical.com
>>>>>>>
>>>>>>>   wrote:
>>>>>>>> HmmmÅ  good question.
>>>>>>>>
>>>>>>>> I think that fixed width support is important for a great
many
>> rowkey
>>>>>>>> constructs cases, so I'd rather see something like losing
MIN_VALUE
>>>>>>>> and
>>>>>>>> keeping fixed width.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4/1/13 2:00 PM, "Nick Dimiduk" <ndimiduk@gmail.com>
wrote:
>>>>>>>>
>>>>>>>>    Heya,
>>>>>>>>
>>>>>>>>> Thinking about data types and serialization. I think
null support
>> is
>>>>>>>>> an
>>>>>>>>> important characteristic for the serialized representations,
>>>>>>>>> especially
>>>>>>>>> when considering the compound type. However, doing so
in directly
>>>>>>>>> incompatible with fixed-width representations for numerics.
For
>>>>>>>>>
>>>>>>>>>   instance,
>>>>>>>> if we want to have a fixed-width signed long stored on 8-bytes,
>> where
>>>>>>>> do
>>>>>>>>
>>>>>>>>> you put null? float and double types can cheat a little
by folding
>>>>>>>>> negative
>>>>>>>>> and positive NaN's into a single representation (this
isn't
>> strictly
>>>>>>>>> correct!), leaving a place to represent null. In the
long example
>>>>>>>>> case,
>>>>>>>>> the
>>>>>>>>> obvious choice is to reduce MAX_VALUE or increase MIN_VALUE
by one.
>>>>>>>>> This
>>>>>>>>> will allocate an additional encoding which can be used
for null. My
>>>>>>>>> experience working with scientific data, however, makes
me wince at
>>>>>>>>> the
>>>>>>>>> idea.
>>>>>>>>>
>>>>>>>>> The variable-width encodings have it a little easier.
There's
>> already
>>>>>>>>> enough going on that it's simpler to make room.
>>>>>>>>>
>>>>>>>>> Remember, the final goal is to support order-preserving
>>>>>>>>> serialization.
>>>>>>>>> This
>>>>>>>>> imposes some limitations on our encoding strategies.
For instance,
>>>>>>>>> it's
>>>>>>>>> not
>>>>>>>>> enough to simply encode null, it really needs to be encoded
as 0x00
>>>>>>>>> so
>>>>>>>>>
>>>>>>>>>   as
>>>>>>>> to sort lexicographically earlier than any other value.
>>>>>>>>
>>>>>>>>> What do you think? Any ideas, experiences, etc?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Nick
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>


Mime
View raw message