Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4B84810ADE for ; Tue, 17 Sep 2013 17:53:33 +0000 (UTC) Received: (qmail 78798 invoked by uid 500); 17 Sep 2013 17:53:27 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 78750 invoked by uid 500); 17 Sep 2013 17:53:25 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 78742 invoked by uid 99); 17 Sep 2013 17:53:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Sep 2013 17:53:25 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of chivas314159@gmail.com designates 209.85.216.44 as permitted sender) Received: from [209.85.216.44] (HELO mail-qa0-f44.google.com) (209.85.216.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Sep 2013 17:53:18 +0000 Received: by mail-qa0-f44.google.com with SMTP id j7so2205307qaq.10 for ; Tue, 17 Sep 2013 10:52:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=iXL55GVOjlM95oATf7bh9Zrl9Mm+UeB4xOtf9jnJCPI=; b=T7VuA9a4abqCeBn910Zbrd/18cormETjQ9ZQZZCt7as1YZLVbOr+uTKP0NYItEha71 zu5u3gxtwA0z+f+sFtymGHvUARApdlVdJVKcWEoE6RziTpnC/lAfm+he4v2NuqqVqBRN 6mpksUoqD5kSGUhJzlktZyA+X/r8gdZeDYz1qZ9w5TLfPcccF7a9+8d8HhFp1M9C4IwD 49argB12B1JXaQmQZ7rCsKJ0WOc0pkiIjgkT/wgzSygJ1xrkw8CUk1SOJzDh/JZX9Xta uhX1+ZSWOpMH9UUAXVt1kbPcVUQ1HlWBcq+Lr8HYHau9G8DmhuuXm8S1ZU8UJnT75GBr Q5jQ== MIME-Version: 1.0 X-Received: by 10.224.113.206 with SMTP id b14mr8584401qaq.66.1379440377962; Tue, 17 Sep 2013 10:52:57 -0700 (PDT) Received: by 10.229.176.195 with HTTP; Tue, 17 Sep 2013 10:52:57 -0700 (PDT) In-Reply-To: References: Date: Tue, 17 Sep 2013 18:52:57 +0100 Message-ID: Subject: Re: hbase schema design From: Adrian CAPDEFIER To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=047d7bea43d8b7aab304e697fd1f X-Virus-Checked: Checked by ClamAV on apache.org --047d7bea43d8b7aab304e697fd1f Content-Type: text/plain; charset=ISO-8859-1 Thanks for the tip. In the data warehousing world I used to call them surrogate keys - I wonder if there's any difference between the two. On Tue, Sep 17, 2013 at 6:41 PM, Vladimir Rodionov wrote: > > Is there a built-in functionality to generate (integer) surrogate values > in > > hbase that can be used on the rowkey or does it need to be hand code it > > from scratch? > > There is no such functionality in HBase. What are asking for is known as a > dictionary compression : > unique 1-1 association between arbitrary strings and numeric values. > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: vrodionov@carrieriq.com > > ________________________________________ > From: Ted Yu [yuzhihong@gmail.com] > Sent: Tuesday, September 17, 2013 9:53 AM > To: user@hbase.apache.org > Subject: Re: hbase schema design > > I guess you were referring to section 6.3.2 > > bq. rowkey is stored and/ or read for every cell value > > The above is true. > > bq. the event description is a string of 0.1 to 2Kb > > You can enable Data Block encoding to reduce storage. > > Cheers > > > > On Tue, Sep 17, 2013 at 9:44 AM, Adrian CAPDEFIER >wrote: > > > Howdy all, > > > > I'm trying to use hbase for the first time (plenty of other experience > with > > RDBMS database though), and I have a couple of questions after reading > The > > Book. > > > > I am a bit confused by the advice to reduce "the row size" in the hbase > > book. It states that every cell value is accomplished by the coordinates > > (row, column and timestamp). I'm just trying to be thorough, so am I to > > understand that the rowkey is stored and/ or read for every cell value > in a > > record or just once per column family in a record? > > > > I am intrigued by the rows as columns design as described in the book at > > http://hbase.apache.org/book.html#rowkey.design. To make a long story > > short, I will end up with a table to store event types and number of > > occurrences in each day. I would prefer to have the event description as > > the row key and the dates when it happened as columns - up to 7300 for > > roughly 20 years. > > However, the event description is a string of 0.1 to 2Kb and if it is > > stored for each cell value, I will need to use a surrogate (shorter) > value. > > > > Is there a built-in functionality to generate (integer) surrogate values > in > > hbase that can be used on the rowkey or does it need to be hand code it > > from scratch? > > > > Confidentiality Notice: The information contained in this message, > including any attachments hereto, may be confidential and is intended to be > read only by the individual or entity to whom this message is addressed. If > the reader of this message is not the intended recipient or an agent or > designee of the intended recipient, please note that any review, use, > disclosure or distribution of this message or its attachments, in any form, > is strictly prohibited. If you have received this message in error, please > immediately notify the sender and/or Notifications@carrieriq.com and > delete or destroy any copy of this message and its attachments. > --047d7bea43d8b7aab304e697fd1f--