hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lin Ma <lin...@gmail.com>
Subject Re: column based or row based storage for HBase?
Date Mon, 06 Aug 2012 03:03:57 GMT
Thank you for the informative reply, Mohit!

Some more comments,

1. actually my confusion about column based storage is from the book "HBase
The Definitive Guide", chapter 1, section "the Dawn of Big Data", which
draw a picture showing HBase store the same column of all different rows
continuously physically in storage. Any comments?

2. I want to confirm my understanding is correct -- supposing I have only
one column family with 10 columns, the physical storage is row (with all
related columns) after row, other than store 1st column of all rows, then
store 2nd columns of all rows, etc?

3. It seems when we say column based storage, there are two meanings, (1)
column oriented database => en.wikipedia.org/wiki/Column-oriented_DBMS,
where the same column of different rows stored together, (2) and column
oriented architecture, e.g. how Hbase is designed, which is used to
describe the pattern to store sparse, large number of columns (with NULL
for free). Any comments?


On Mon, Aug 6, 2012 at 12:08 AM, Mohit Anchlia <mohitanchlia@gmail.com>wrote:

> On Sun, Aug 5, 2012 at 6:04 AM, Lin Ma <linlma@gmail.com> wrote:
> > Hi guys,
> >
> > I am wondering whether HBase is using column based storage or row based
> > storage?
> >
> >    - I read some technical documents and mentioned advantages of HBase is
> >    using column based storage to store similar data together to foster
> >    compression. So it means same columns of different rows are stored
> > together;
> Probably what you read was in context of Column Families. HBase has concept
> of column family similar to Google's bigtable. And the store files on disk
> is per column family. All columns of a given column family are in one store
> file and columns of different column family is a different file.
> >    - But I also learned HBase is a sorted key-value map in underlying
> >    HFile. It uses key to address all related columns for that key (row),
> > so it
> >    seems to be a row based storage?
> >
> HBase stores entire row together along with columns represented by
> KeyValue. This is also called cell in HBase.
> > It is appreciated if anyone could clarify my confusions. Any related
> > documents or code for more details are welcome.
> >
> > thanks in advance,
> >
> > Lin
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message