Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 86F51F4D0 for ; Mon, 1 Apr 2013 23:31:52 +0000 (UTC) Received: (qmail 24051 invoked by uid 500); 1 Apr 2013 23:31:51 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 23990 invoked by uid 500); 1 Apr 2013 23:31:51 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 23975 invoked by uid 99); 1 Apr 2013 23:31:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Apr 2013 23:31:51 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jtaylor@salesforce.com designates 64.18.3.22 as permitted sender) Received: from [64.18.3.22] (HELO exprod8og111.obsmtp.com) (64.18.3.22) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 01 Apr 2013 23:31:45 +0000 Received: from exsfm-hub5.internal.salesforce.com ([204.14.239.233]) by exprod8ob111.postini.com ([64.18.7.12]) with SMTP ID DSNKUVoYzOwHFNQGcDWHpZKDphrnLi/kMB/k@postini.com; Mon, 01 Apr 2013 16:31:25 PDT Received: from [10.0.54.31] (10.0.54.31) by exsfm-hub5.internal.salesforce.com (10.1.127.5) with Microsoft SMTP Server (TLS) id 8.3.279.5; Mon, 1 Apr 2013 16:31:24 -0700 Message-ID: <515A18CB.3050601@salesforce.com> Date: Mon, 1 Apr 2013 16:31:23 -0700 From: James Taylor User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: "dev@hbase.apache.org" CC: hbase-user Subject: Re: HBase Types: Explicit Null Support References: In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org From the SQL perspective, handling null is important. Phoenix supports null in the following way: - the absence of a key value - an empty value in a key value - an empty value in a multi part row key - for variable length types (VARCHAR and DECIMAL) a null byte separator would be used if not the last column - for fixed width types only the last column is allowed to be null As you mentioned, it's important to maintain the lexicographical sort order with nulls being first. On 04/01/2013 01:32 PM, Nick Dimiduk wrote: > Thanks for the thoughtful response (and code!). > > I'm thinking I will press forward with a base implementation that does not > support nulls. The idea is to provide an extensible set of interfaces, so I > think this will not box us into a corner later. That is, a mirroring > package could be implemented that supports null values and accepts > the relevant trade-offs. > > Thanks, > Nick > > On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan wrote: > >> I spent some time this weekend extracting bits of our serialization code to >> a public github repo at http://github.com/hotpads/data-tools. >> Contributions are welcome - i'm sure we all have this stuff laying around. >> >> You can see I've bumped into the NULL problem in a few places: >> * >> >> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java >> * >> >> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java >> >> Looking back, I think my latest opinion on the topic is to reject >> nullability as the rule since it can cause unexpected behavior and >> confusion. It's cleaner to provide a wrapper class (so both LongArrayList >> plus NullableLongArrayList) that explicitly defines the behavior, and costs >> a little more in performance. If the user can't find a pre-made wrapper >> class, it's not very difficult for each user to provide their own >> interpretation of null and check for it themselves. >> >> If you reject nullability, the question becomes what to do in situations >> where you're implementing existing interfaces that accept nullable params. >> The LongArrayList above implements List which requires an add(Long) >> method. In the above implementation I chose to swap nulls with >> Long.MIN_VALUE, however I'm now thinking it best to force the user to make >> that swap and then throw IllegalArgumentException if they pass null. >> >> >> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil >> wrote: >>> HmmmÅ  good question. >>> >>> I think that fixed width support is important for a great many rowkey >>> constructs cases, so I'd rather see something like losing MIN_VALUE and >>> keeping fixed width. >>> >>> >>> >>> >>> On 4/1/13 2:00 PM, "Nick Dimiduk" wrote: >>> >>>> Heya, >>>> >>>> Thinking about data types and serialization. I think null support is an >>>> important characteristic for the serialized representations, especially >>>> when considering the compound type. However, doing so in directly >>>> incompatible with fixed-width representations for numerics. For >> instance, >>>> if we want to have a fixed-width signed long stored on 8-bytes, where do >>>> you put null? float and double types can cheat a little by folding >>>> negative >>>> and positive NaN's into a single representation (this isn't strictly >>>> correct!), leaving a place to represent null. In the long example case, >>>> the >>>> obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one. This >>>> will allocate an additional encoding which can be used for null. My >>>> experience working with scientific data, however, makes me wince at the >>>> idea. >>>> >>>> The variable-width encodings have it a little easier. There's already >>>> enough going on that it's simpler to make room. >>>> >>>> Remember, the final goal is to support order-preserving serialization. >>>> This >>>> imposes some limitations on our encoding strategies. For instance, it's >>>> not >>>> enough to simply encode null, it really needs to be encoded as 0x00 so >> as >>>> to sort lexicographically earlier than any other value. >>>> >>>> What do you think? Any ideas, experiences, etc? >>>> >>>> Thanks, >>>> Nick >>> >>> >>>