hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian CAPDEFIER <chivas314...@gmail.com>
Subject Re: hbase schema design
Date Tue, 17 Sep 2013 17:52:57 GMT
Thanks for the tip. In the data warehousing world I used to call them
surrogate keys - I wonder if there's any difference between the two.


On Tue, Sep 17, 2013 at 6:41 PM, Vladimir Rodionov
<vrodionov@carrieriq.com>wrote:

> > Is there a built-in functionality to generate (integer) surrogate values
> in
> > hbase that can be used on the rowkey or does it need to be hand code it
> > from scratch?
>
> There is no such functionality in HBase. What are asking for is known as a
> dictionary compression :
> unique 1-1 association between arbitrary strings and numeric values.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Ted Yu [yuzhihong@gmail.com]
> Sent: Tuesday, September 17, 2013 9:53 AM
> To: user@hbase.apache.org
> Subject: Re: hbase schema design
>
> I guess you were referring to section 6.3.2
>
> bq. rowkey is stored and/ or read for every cell value
>
> The above is true.
>
> bq. the event description is a string of 0.1 to 2Kb
>
> You can enable Data Block encoding to reduce storage.
>
> Cheers
>
>
>
> On Tue, Sep 17, 2013 at 9:44 AM, Adrian CAPDEFIER <chivas314159@gmail.com
> >wrote:
>
> > Howdy all,
> >
> > I'm trying to use hbase for the first time (plenty of other experience
> with
> > RDBMS database though), and I have a couple of questions after reading
> The
> > Book.
> >
> > I am a bit confused by the advice to reduce "the row size" in the hbase
> > book. It states that every cell value is accomplished by the coordinates
> > (row, column and timestamp). I'm just trying to be thorough, so am I to
> > understand that the rowkey is stored and/ or read for every cell value
> in a
> > record or just once per column family in a record?
> >
> > I am intrigued by the rows as columns design as described in the book at
> > http://hbase.apache.org/book.html#rowkey.design. To make a long story
> > short, I will end up with a table to store event types and number of
> > occurrences in each day. I would prefer to have the event description as
> > the row key and the dates when it happened as columns - up to 7300 for
> > roughly 20 years.
> > However, the event description is a string of 0.1 to 2Kb and if it is
> > stored for each cell value, I will need to use a surrogate (shorter)
> value.
> >
> > Is there a built-in functionality to generate (integer) surrogate values
> in
> > hbase that can be used on the rowkey or does it need to be hand code it
> > from scratch?
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message