Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1777FF2AE for ; Mon, 1 Apr 2013 18:01:35 +0000 (UTC) Received: (qmail 8473 invoked by uid 500); 1 Apr 2013 18:01:33 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 8271 invoked by uid 500); 1 Apr 2013 18:01:32 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 8260 invoked by uid 99); 1 Apr 2013 18:01:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Apr 2013 18:01:32 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ndimiduk@gmail.com designates 209.85.220.177 as permitted sender) Received: from [209.85.220.177] (HELO mail-vc0-f177.google.com) (209.85.220.177) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Apr 2013 18:01:27 +0000 Received: by mail-vc0-f177.google.com with SMTP id ia10so2603232vcb.22 for ; Mon, 01 Apr 2013 11:01:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:from:date:message-id:subject:to :content-type; bh=zFe8s/qGvnuuVTLOhgCifTucuRub3sqaRf+KzFZoHLc=; b=y0B3K+mKW5P9VGxgTh1BbkbUocOCSg6Nl6E0qK9eTOPFdhHgsJ8Q7ksDt78S5tABGp O4XFFtcIPlMnN2YHkaFtn8BbtZppGgJqWy1xFYGP9kNbPYRF+jabDMW6cKUiA0ApFZPU O+Ng/aWqQPBApJtlknDI0Nww8IpJckq3s8aczDQJv9Ewfg+Iyr0uoqe5uXgN2fodl5zK +38whoG1G22vGM32477njEee32EkRDUWOEovXpKKNYkBe1oAFYUiWIYPyank7RYDLOYf UE5+WdeTgPQCNbSfJm8i5PnxCwjH/U+Tqrm4FBjqx0h7MhcyTxQulGzGpj+MmVgX6nap 7C6Q== X-Received: by 10.52.29.70 with SMTP id i6mr8394624vdh.98.1364839266798; Mon, 01 Apr 2013 11:01:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.58.255.129 with HTTP; Mon, 1 Apr 2013 11:00:46 -0700 (PDT) From: Nick Dimiduk Date: Mon, 1 Apr 2013 11:00:46 -0700 Message-ID: Subject: HBase Types: Explicit Null Support To: hbase-dev , hbase-user Content-Type: multipart/alternative; boundary=20cf3079b5eaac542f04d95067cf X-Virus-Checked: Checked by ClamAV on apache.org --20cf3079b5eaac542f04d95067cf Content-Type: text/plain; charset=UTF-8 Heya, Thinking about data types and serialization. I think null support is an important characteristic for the serialized representations, especially when considering the compound type. However, doing so in directly incompatible with fixed-width representations for numerics. For instance, if we want to have a fixed-width signed long stored on 8-bytes, where do you put null? float and double types can cheat a little by folding negative and positive NaN's into a single representation (this isn't strictly correct!), leaving a place to represent null. In the long example case, the obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one. This will allocate an additional encoding which can be used for null. My experience working with scientific data, however, makes me wince at the idea. The variable-width encodings have it a little easier. There's already enough going on that it's simpler to make room. Remember, the final goal is to support order-preserving serialization. This imposes some limitations on our encoding strategies. For instance, it's not enough to simply encode null, it really needs to be encoded as 0x00 so as to sort lexicographically earlier than any other value. What do you think? Any ideas, experiences, etc? Thanks, Nick --20cf3079b5eaac542f04d95067cf--