hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: column based or row based storage for HBase?
Date Mon, 06 Aug 2012 04:30:18 GMT
A key in HBase looks like this: (rowkey, column family, column, timestamp)

HBase will do two things for you:
1. All keys that have the same row key are stored in the same region
2. All keys are sorted


(The column family is special in the each column family has it's one store file, but the logical
sort order still holds).

Think of it this way.
Say you have two column families and two regions (A and B). You find the following ordering:
Storefile(s) for column family 1 in Region A:
(row1, column family1, column1, ts)->value
(row1, column family1, column2, ts)->value
(row2, column family1, column1, ts)->value
(row2, column family1, column2, ts)->value

Storefile(s) for column family 1 in Region B:
(row3, column family1, column1, ts)->value
(row3, column family1, column2, ts)->value

Storefile(s) for column family 2: in Region A:
(row1, column family2, column1, ts)->value
(row1, column family2, column2, ts)->value
(row2, column family2, column1, ts)->value
(row2, column family2, column2, ts)->value

Storefile(s) for column family 2 in Region B:
(row3, column family2, column1, ts)->value
(row3, column family2, column2, ts)->value

So region A has rows row1 and row2, region B has row3.
A region is shard of a table based on the row key and just 

#1 above means that HBase will never place key value for "row1" in different regions.
#2 means you very efficiently locate specific keys, as they are always stored sorted.

You should work through the topic in the HBase book: http://hbase.apache.org/book/datamodel.html.

-- Lars


----- Original Message -----
From: Lin Ma <linlma@gmail.com>
To: user@hbase.apache.org; lars hofhansl <lhofhansl@yahoo.com>
Cc: 
Sent: Sunday, August 5, 2012 8:44 PM
Subject: Re: column based or row based storage for HBase?

Hi Lars,

What do you mean a set of "keys that have the same row key" and
"colocated"? It will be appreciated if you could show an example or provide
more information.

regards,
Lin

On Mon, Aug 6, 2012 at 3:42 AM, lars hofhansl <lhofhansl@yahoo.com> wrote:

> Hi Lin,
>
> HBase stores key -> value mappings sorted by key. So it is a key value
> store.
>
> The key has internal structure, for example it starts with a row key.
> HBase makes extra guarantees about a set of keys that have the same row
> key (keeps them colocated, allows atomic operations, etc).
>
> I tried to write this up a while back:
> http://hadoop-hbase.blogspot.com/2011/12/introduction-to-hbase.html
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: Lin Ma <linlma@gmail.com>
> To: user@hbase.apache.org
> Cc:
> Sent: Sunday, August 5, 2012 6:04 AM
> Subject: column based or row based storage for HBase?
>
> Hi guys,
>
> I am wondering whether HBase is using column based storage or row based
> storage?
>
>    - I read some technical documents and mentioned advantages of HBase is
>    using column based storage to store similar data together to foster
>    compression. So it means same columns of different rows are stored
> together;
>    - But I also learned HBase is a sorted key-value map in underlying
>    HFile. It uses key to address all related columns for that key (row),
> so it
>    seems to be a row based storage?
>
> It is appreciated if anyone could clarify my confusions. Any related
> documents or code for more details are welcome.
>
> thanks in advance,
>
> Lin
>
>


Mime
View raw message