Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B8403F808 for ; Tue, 2 Apr 2013 03:39:31 +0000 (UTC) Received: (qmail 99421 invoked by uid 500); 2 Apr 2013 03:39:29 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 98771 invoked by uid 500); 2 Apr 2013 03:39:28 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 98372 invoked by uid 99); 2 Apr 2013 03:39:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Apr 2013 03:39:26 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of enis.soz@gmail.com designates 209.85.216.178 as permitted sender) Received: from [209.85.216.178] (HELO mail-qc0-f178.google.com) (209.85.216.178) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Apr 2013 03:39:22 +0000 Received: by mail-qc0-f178.google.com with SMTP id d10so1390690qca.9 for ; Mon, 01 Apr 2013 20:39:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=Jrb0M9UbelLCClnTzeLDrjUZbLBhRsV24owaX9rhIyw=; b=EDYF9+443Voe7pWvLl6uGMHmLweXhoil0aZ2xc/0gbRFWRrUPcgm0E74TUSW0J+UqY w6WzBIf6ID/6VfQ43WavxTqFnU0RoVQbFjMvytmtEGfFlrPxCOmxIvjpg52v1I0OOzPF kO0E4bRsQJpr+uAmDajQOVyahTutrVnqdSz4NiffB55xLD01qSN3wK9nVz7oTmKezuNR MZGQcw9Idad/HltVuBuivJoHBvkmc5LEstpWm0eF3akLuVNDggloCgV+NF5QSxzbrl7s /efGnodTVtlUEy6kC847OPdoo6gY11YuMG9JARV/g8w8Z5eRpQsSsPnkXv6xLhwLISte mAog== X-Received: by 10.229.136.142 with SMTP id r14mr5529641qct.54.1364873941813; Mon, 01 Apr 2013 20:39:01 -0700 (PDT) MIME-Version: 1.0 Received: by 10.49.94.103 with HTTP; Mon, 1 Apr 2013 20:38:40 -0700 (PDT) In-Reply-To: References: <515A18CB.3050601@salesforce.com> From: =?UTF-8?Q?Enis_S=C3=B6ztutar?= Date: Mon, 1 Apr 2013 20:38:40 -0700 Message-ID: Subject: Re: HBase Types: Explicit Null Support To: hbase-user Cc: hbase-dev Content-Type: multipart/alternative; boundary=00248c71180d77159604d9587a6b X-Virus-Checked: Checked by ClamAV on apache.org --00248c71180d77159604d9587a6b Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I think having Int32, and NullableInt32 would support minimum overhead, as well as allowing SQL semantics. On Mon, Apr 1, 2013 at 7:26 PM, Nick Dimiduk wrote: > Furthermore, is is more important to support null values than squeeze all > representations into minimum size (4-bytes for int32, &c.)? > On Apr 1, 2013 4:41 PM, "Nick Dimiduk" wrote: > > > On Mon, Apr 1, 2013 at 4:31 PM, James Taylor >wrote: > > > >> From the SQL perspective, handling null is important. > > > > > > From your perspective, it is critical to support NULLs, even at the > > expense of fixed-width encodings at all or supporting representation of= a > > full range of values. That is, you'd rather be able to represent NULL > than > > -2^31? > > > > On 04/01/2013 01:32 PM, Nick Dimiduk wrote: > >> > >>> Thanks for the thoughtful response (and code!). > >>> > >>> I'm thinking I will press forward with a base implementation that doe= s > >>> not > >>> support nulls. The idea is to provide an extensible set of interfaces= , > >>> so I > >>> think this will not box us into a corner later. That is, a mirroring > >>> package could be implemented that supports null values and accepts > >>> the relevant trade-offs. > >>> > >>> Thanks, > >>> Nick > >>> > >>> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan > >>> wrote: > >>> > >>> I spent some time this weekend extracting bits of our serialization > >>>> code to > >>>> a public github repo at http://github.com/hotpads/**data-tools< > http://github.com/hotpads/data-tools> > >>>> . > >>>> Contributions are welcome - i'm sure we all have this stuff laying > >>>> around. > >>>> > >>>> You can see I've bumped into the NULL problem in a few places: > >>>> * > >>>> > >>>> https://github.com/hotpads/**data-tools/blob/master/src/** > >>>> main/java/com/hotpads/data/**primitive/lists/LongArrayList.**java< > https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpa= ds/data/primitive/lists/LongArrayList.java > > > >>>> * > >>>> > >>>> https://github.com/hotpads/**data-tools/blob/master/src/** > >>>> main/java/com/hotpads/data/**types/floats/DoubleByteTool.**java< > https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpa= ds/data/types/floats/DoubleByteTool.java > > > >>>> > >>>> Looking back, I think my latest opinion on the topic is to reject > >>>> nullability as the rule since it can cause unexpected behavior and > >>>> confusion. It's cleaner to provide a wrapper class (so both > >>>> LongArrayList > >>>> plus NullableLongArrayList) that explicitly defines the behavior, an= d > >>>> costs > >>>> a little more in performance. If the user can't find a pre-made > wrapper > >>>> class, it's not very difficult for each user to provide their own > >>>> interpretation of null and check for it themselves. > >>>> > >>>> If you reject nullability, the question becomes what to do in > situations > >>>> where you're implementing existing interfaces that accept nullable > >>>> params. > >>>> The LongArrayList above implements List which requires an > >>>> add(Long) > >>>> method. In the above implementation I chose to swap nulls with > >>>> Long.MIN_VALUE, however I'm now thinking it best to force the user t= o > >>>> make > >>>> that swap and then throw IllegalArgumentException if they pass null. > >>>> > >>>> > >>>> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil < > >>>> doug.meil@explorysmedical.com > >>>> > >>>>> wrote: > >>>>> Hmmm=C5=A0 good question. > >>>>> > >>>>> I think that fixed width support is important for a great many rowk= ey > >>>>> constructs cases, so I'd rather see something like losing MIN_VALUE > and > >>>>> keeping fixed width. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On 4/1/13 2:00 PM, "Nick Dimiduk" wrote: > >>>>> > >>>>> Heya, > >>>>>> > >>>>>> Thinking about data types and serialization. I think null support = is > >>>>>> an > >>>>>> important characteristic for the serialized representations, > >>>>>> especially > >>>>>> when considering the compound type. However, doing so in directly > >>>>>> incompatible with fixed-width representations for numerics. For > >>>>>> > >>>>> instance, > >>>> > >>>>> if we want to have a fixed-width signed long stored on 8-bytes, whe= re > >>>>>> do > >>>>>> you put null? float and double types can cheat a little by folding > >>>>>> negative > >>>>>> and positive NaN's into a single representation (this isn't strict= ly > >>>>>> correct!), leaving a place to represent null. In the long example > >>>>>> case, > >>>>>> the > >>>>>> obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one= . > >>>>>> This > >>>>>> will allocate an additional encoding which can be used for null. M= y > >>>>>> experience working with scientific data, however, makes me wince a= t > >>>>>> the > >>>>>> idea. > >>>>>> > >>>>>> The variable-width encodings have it a little easier. There's > already > >>>>>> enough going on that it's simpler to make room. > >>>>>> > >>>>>> Remember, the final goal is to support order-preserving > serialization. > >>>>>> This > >>>>>> imposes some limitations on our encoding strategies. For instance, > >>>>>> it's > >>>>>> not > >>>>>> enough to simply encode null, it really needs to be encoded as 0x0= 0 > so > >>>>>> > >>>>> as > >>>> > >>>>> to sort lexicographically earlier than any other value. > >>>>>> > >>>>>> What do you think? Any ideas, experiences, etc? > >>>>>> > >>>>>> Thanks, > >>>>>> Nick > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >> > > > --00248c71180d77159604d9587a6b--