Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 195F8F5FC for ; Tue, 2 Apr 2013 00:11:08 +0000 (UTC) Received: (qmail 43829 invoked by uid 500); 2 Apr 2013 00:11:07 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 43656 invoked by uid 500); 2 Apr 2013 00:11:07 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 43648 invoked by uid 99); 2 Apr 2013 00:11:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Apr 2013 00:11:07 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com designates 209.85.217.180 as permitted sender) Received: from [209.85.217.180] (HELO mail-lb0-f180.google.com) (209.85.217.180) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Apr 2013 00:11:03 +0000 Received: by mail-lb0-f180.google.com with SMTP id t11so2450454lbi.11 for ; Mon, 01 Apr 2013 17:10:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=CU3EFNkaEdbxwo96e1E9VaXDdvJFame9ijF8KXppWCw=; b=UfYbpyUtywmAgGmOY3n0jnEas3mDkSkC/NaWFPnBfNOEZ9uek98hLQ0oPV+ra7K7jW Qr03QAgRMUeD8sw3cwq5zGpioP10juixcffuRmH17CMhgo2Dm7jfApemUwxlX3yyncDU t5fhXxFiJWt4m3vJfd9sS5/myWTo+61jlGGFA4jrUXd57cgQG64pcOPm9Ql10IgPc03g AZ8GOJaswThGsQ75G1GbejeOYBN+UdSE23kEUsi2spdPUXAIIxksv/yNK5LdVzZQsOnq o/M8pzwDgFBCeb7lPBlX4Pd/gl3h4JKF/SO6z1S8wax78E7bEOA/KiNxu69jo32o11tq dk7g== MIME-Version: 1.0 X-Received: by 10.152.145.134 with SMTP id su6mr6690414lab.35.1364861441710; Mon, 01 Apr 2013 17:10:41 -0700 (PDT) Received: by 10.112.84.133 with HTTP; Mon, 1 Apr 2013 17:10:41 -0700 (PDT) In-Reply-To: References: <515A18CB.3050601@salesforce.com> <515A1CF6.5090804@salesforce.com> Date: Mon, 1 Apr 2013 17:10:41 -0700 Message-ID: Subject: Re: HBase Types: Explicit Null Support From: Ted Yu To: "dev@hbase.apache.org" Content-Type: multipart/alternative; boundary=e89a8f23465966a05904d9559116 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f23465966a05904d9559116 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable bq. I create a dummy qualifier with a dummy value For any single application, the above can be done. For generic applications, how would we do this ? Thanks On Mon, Apr 1, 2013 at 5:07 PM, Matt Corgan wrote: > I generally don't allow nulls in my composite row keys. Does SQL allow > nulls in the PK? In the rare case I wanted to do that I might create a > separate format called NullableCInt32 with 5 bytes where the first one > determined null. It's important to keep the pure types pure. > > I have lots of null *values* however, but they're represented by lack of = a > qualifier in the Put. If a row has all null values, I create a dummy > qualifier with a dummy value to make sure the row key gets inserted as it > would in sql. > > > On Mon, Apr 1, 2013 at 4:49 PM, James Taylor > wrote: > > > On 04/01/2013 04:41 PM, Nick Dimiduk wrote: > > > >> On Mon, Apr 1, 2013 at 4:31 PM, James Taylor > >> wrote: > >> > >> From the SQL perspective, handling null is important. > >>> > >> > >> From your perspective, it is critical to support NULLs, even at the > >> expense > >> of fixed-width encodings at all or supporting representation of a full > >> range of values. That is, you'd rather be able to represent NULL than > >> -2^31? > >> > > We've been able to get away with supporting NULL through the absence of > > the value rather than restricting the data range. We haven't had any pu= sh > > back on not allowing a fixed width nullable leading row key column. Sin= ce > > our variable length DECIMAL supports null and is a superset of the fixe= d > > width numeric types, users have a reasonable alternative. > > > > I'd rather not restrict the range of values, since it doesn't seem like > > this would be necessary. > > > > > >> On 04/01/2013 01:32 PM, Nick Dimiduk wrote: > >> > >>> Thanks for the thoughtful response (and code!). > >>>> > >>>> I'm thinking I will press forward with a base implementation that do= es > >>>> not > >>>> support nulls. The idea is to provide an extensible set of interface= s, > >>>> so > >>>> I > >>>> think this will not box us into a corner later. That is, a mirroring > >>>> package could be implemented that supports null values and accepts > >>>> the relevant trade-offs. > >>>> > >>>> Thanks, > >>>> Nick > >>>> > >>>> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan > >>>> wrote: > >>>> > >>>> I spent some time this weekend extracting bits of our serializatio= n > >>>> code > >>>> > >>>>> to > >>>>> a public github repo at http://github.com/hotpads/****data-tools< > http://github.com/hotpads/**data-tools> > >>>>> http://github.com/hotpads/data-tools> > >>>>> > > >>>>> . > >>>>> Contributions are welcome - i'm sure we all have this stuff layi= ng > >>>>> around. > >>>>> > >>>>> You can see I've bumped into the NULL problem in a few places: > >>>>> * > >>>>> > >>>>> https://github.com/hotpads/****data-tools/blob/master/src/**< > https://github.com/hotpads/**data-tools/blob/master/src/**> > >>>>> > main/java/com/hotpads/data/****primitive/lists/LongArrayList.****java< > >>>>> https://github.com/**hotpads/data-tools/blob/** > >>>>> master/src/main/java/com/**hotpads/data/primitive/lists/** > >>>>> LongArrayList.java< > https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpa= ds/data/primitive/lists/LongArrayList.java > > > >>>>> > > >>>>> * > >>>>> > >>>>> https://github.com/hotpads/****data-tools/blob/master/src/**< > https://github.com/hotpads/**data-tools/blob/master/src/**> > >>>>> main/java/com/hotpads/data/****types/floats/DoubleByteTool.****java= < > >>>>> https://github.com/**hotpads/data-tools/blob/** > >>>>> master/src/main/java/com/**hotpads/data/types/floats/** > >>>>> DoubleByteTool.java< > https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpa= ds/data/types/floats/DoubleByteTool.java > > > >>>>> > > >>>>> > >>>>> Looking back, I think my latest opinion on the topic is to reject > >>>>> nullability as the rule since it can cause unexpected behavior and > >>>>> confusion. It's cleaner to provide a wrapper class (so both > >>>>> LongArrayList > >>>>> plus NullableLongArrayList) that explicitly defines the behavior, a= nd > >>>>> costs > >>>>> a little more in performance. If the user can't find a pre-made > >>>>> wrapper > >>>>> class, it's not very difficult for each user to provide their own > >>>>> interpretation of null and check for it themselves. > >>>>> > >>>>> If you reject nullability, the question becomes what to do in > >>>>> situations > >>>>> where you're implementing existing interfaces that accept nullable > >>>>> params. > >>>>> The LongArrayList above implements List which requires an > >>>>> add(Long) > >>>>> method. In the above implementation I chose to swap nulls with > >>>>> Long.MIN_VALUE, however I'm now thinking it best to force the user = to > >>>>> make > >>>>> that swap and then throw IllegalArgumentException if they pass null= . > >>>>> > >>>>> > >>>>> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil < > >>>>> doug.meil@explorysmedical.com > >>>>> > >>>>> wrote: > >>>>>> Hmmm=C5=A0 good question. > >>>>>> > >>>>>> I think that fixed width support is important for a great many > rowkey > >>>>>> constructs cases, so I'd rather see something like losing MIN_VALU= E > >>>>>> and > >>>>>> keeping fixed width. > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 4/1/13 2:00 PM, "Nick Dimiduk" wrote: > >>>>>> > >>>>>> Heya, > >>>>>> > >>>>>>> Thinking about data types and serialization. I think null support > is > >>>>>>> an > >>>>>>> important characteristic for the serialized representations, > >>>>>>> especially > >>>>>>> when considering the compound type. However, doing so in directly > >>>>>>> incompatible with fixed-width representations for numerics. For > >>>>>>> > >>>>>>> instance, > >>>>>> if we want to have a fixed-width signed long stored on 8-bytes, > where > >>>>>> do > >>>>>> > >>>>>>> you put null? float and double types can cheat a little by foldin= g > >>>>>>> negative > >>>>>>> and positive NaN's into a single representation (this isn't > strictly > >>>>>>> correct!), leaving a place to represent null. In the long example > >>>>>>> case, > >>>>>>> the > >>>>>>> obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by on= e. > >>>>>>> This > >>>>>>> will allocate an additional encoding which can be used for null. = My > >>>>>>> experience working with scientific data, however, makes me wince = at > >>>>>>> the > >>>>>>> idea. > >>>>>>> > >>>>>>> The variable-width encodings have it a little easier. There's > already > >>>>>>> enough going on that it's simpler to make room. > >>>>>>> > >>>>>>> Remember, the final goal is to support order-preserving > >>>>>>> serialization. > >>>>>>> This > >>>>>>> imposes some limitations on our encoding strategies. For instance= , > >>>>>>> it's > >>>>>>> not > >>>>>>> enough to simply encode null, it really needs to be encoded as 0x= 00 > >>>>>>> so > >>>>>>> > >>>>>>> as > >>>>>> to sort lexicographically earlier than any other value. > >>>>>> > >>>>>>> What do you think? Any ideas, experiences, etc? > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Nick > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > > > --e89a8f23465966a05904d9559116--