hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enis Söztutar <enis....@gmail.com>
Subject Re: HBase Types: Explicit Null Support
Date Tue, 02 Apr 2013 03:38:40 GMT
I think having Int32, and NullableInt32 would support minimum overhead, as
well as allowing SQL semantics.


On Mon, Apr 1, 2013 at 7:26 PM, Nick Dimiduk <ndimiduk@gmail.com> wrote:

> Furthermore, is is more important to support null values than squeeze all
> representations into minimum size (4-bytes for int32, &c.)?
> On Apr 1, 2013 4:41 PM, "Nick Dimiduk" <ndimiduk@gmail.com> wrote:
>
> > On Mon, Apr 1, 2013 at 4:31 PM, James Taylor <jtaylor@salesforce.com
> >wrote:
> >
> >> From the SQL perspective, handling null is important.
> >
> >
> > From your perspective, it is critical to support NULLs, even at the
> > expense of fixed-width encodings at all or supporting representation of a
> > full range of values. That is, you'd rather be able to represent NULL
> than
> > -2^31?
> >
> > On 04/01/2013 01:32 PM, Nick Dimiduk wrote:
> >>
> >>> Thanks for the thoughtful response (and code!).
> >>>
> >>> I'm thinking I will press forward with a base implementation that does
> >>> not
> >>> support nulls. The idea is to provide an extensible set of interfaces,
> >>> so I
> >>> think this will not box us into a corner later. That is, a mirroring
> >>> package could be implemented that supports null values and accepts
> >>> the relevant trade-offs.
> >>>
> >>> Thanks,
> >>> Nick
> >>>
> >>> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <mcorgan@hotpads.com>
> >>> wrote:
> >>>
> >>>  I spent some time this weekend extracting bits of our serialization
> >>>> code to
> >>>> a public github repo at http://github.com/hotpads/**data-tools<
> http://github.com/hotpads/data-tools>
> >>>> .
> >>>>   Contributions are welcome - i'm sure we all have this stuff laying
> >>>> around.
> >>>>
> >>>> You can see I've bumped into the NULL problem in a few places:
> >>>> *
> >>>>
> >>>> https://github.com/hotpads/**data-tools/blob/master/src/**
> >>>> main/java/com/hotpads/data/**primitive/lists/LongArrayList.**java<
> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java
> >
> >>>> *
> >>>>
> >>>> https://github.com/hotpads/**data-tools/blob/master/src/**
> >>>> main/java/com/hotpads/data/**types/floats/DoubleByteTool.**java<
> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java
> >
> >>>>
> >>>> Looking back, I think my latest opinion on the topic is to reject
> >>>> nullability as the rule since it can cause unexpected behavior and
> >>>> confusion.  It's cleaner to provide a wrapper class (so both
> >>>> LongArrayList
> >>>> plus NullableLongArrayList) that explicitly defines the behavior, and
> >>>> costs
> >>>> a little more in performance.  If the user can't find a pre-made
> wrapper
> >>>> class, it's not very difficult for each user to provide their own
> >>>> interpretation of null and check for it themselves.
> >>>>
> >>>> If you reject nullability, the question becomes what to do in
> situations
> >>>> where you're implementing existing interfaces that accept nullable
> >>>> params.
> >>>>   The LongArrayList above implements List<Long> which requires
an
> >>>> add(Long)
> >>>> method.  In the above implementation I chose to swap nulls with
> >>>> Long.MIN_VALUE, however I'm now thinking it best to force the user to
> >>>> make
> >>>> that swap and then throw IllegalArgumentException if they pass null.
> >>>>
> >>>>
> >>>> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <
> >>>> doug.meil@explorysmedical.com
> >>>>
> >>>>> wrote:
> >>>>> HmmmŠ good question.
> >>>>>
> >>>>> I think that fixed width support is important for a great many rowkey
> >>>>> constructs cases, so I'd rather see something like losing MIN_VALUE
> and
> >>>>> keeping fixed width.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 4/1/13 2:00 PM, "Nick Dimiduk" <ndimiduk@gmail.com> wrote:
> >>>>>
> >>>>>  Heya,
> >>>>>>
> >>>>>> Thinking about data types and serialization. I think null support
is
> >>>>>> an
> >>>>>> important characteristic for the serialized representations,
> >>>>>> especially
> >>>>>> when considering the compound type. However, doing so in directly
> >>>>>> incompatible with fixed-width representations for numerics.
For
> >>>>>>
> >>>>> instance,
> >>>>
> >>>>> if we want to have a fixed-width signed long stored on 8-bytes,
where
> >>>>>> do
> >>>>>> you put null? float and double types can cheat a little by folding
> >>>>>> negative
> >>>>>> and positive NaN's into a single representation (this isn't
strictly
> >>>>>> correct!), leaving a place to represent null. In the long example
> >>>>>> case,
> >>>>>> the
> >>>>>> obvious choice is to reduce MAX_VALUE or increase MIN_VALUE
by one.
> >>>>>> This
> >>>>>> will allocate an additional encoding which can be used for null.
My
> >>>>>> experience working with scientific data, however, makes me wince
at
> >>>>>> the
> >>>>>> idea.
> >>>>>>
> >>>>>> The variable-width encodings have it a little easier. There's
> already
> >>>>>> enough going on that it's simpler to make room.
> >>>>>>
> >>>>>> Remember, the final goal is to support order-preserving
> serialization.
> >>>>>> This
> >>>>>> imposes some limitations on our encoding strategies. For instance,
> >>>>>> it's
> >>>>>> not
> >>>>>> enough to simply encode null, it really needs to be encoded
as 0x00
> so
> >>>>>>
> >>>>> as
> >>>>
> >>>>> to sort lexicographically earlier than any other value.
> >>>>>>
> >>>>>> What do you think? Any ideas, experiences, etc?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Nick
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message