hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lin Ma <lin...@gmail.com>
Subject Re: column based or row based storage for HBase?
Date Mon, 06 Aug 2012 10:36:43 GMT
Thank you Yong,

So just clarify one thing, for your comments -- "column family stores
continuously", does not mean data are stored *column after column physically
* (e.g. store col1 of row 1, then col 1 of row 2, then col 1 of row 3, then
col 2 of row 1, then col 2 of row 2, and finally col 2 of row 3), but means
stored *row after row physically* (store col1 of row 1, then col 2 of row
1, then col1 of row 2, then col 2 of row 2, then  col1 of row 3, then col 2
of row 3)?

regards,
Lin

On Mon, Aug 6, 2012 at 11:37 AM, yonghu <yongyong313@gmail.com> wrote:

> In my understanding of column-oriented structure of hbase, the first
> thing is the term column-oriented. The meaning is that the data which
> belongs to the same column family stores continuously in the disk. For
> each column-family, the data is stored as row store. If you want to
> understand the internal mechnisam of HBase, you'd better take a look
> at the content of HFile.
>
> regards!
>
> Yong
>
> On Mon, Aug 6, 2012 at 5:03 AM, Lin Ma <linlma@gmail.com> wrote:
> > Thank you for the informative reply, Mohit!
> >
> > Some more comments,
> >
> > 1. actually my confusion about column based storage is from the book
> "HBase
> > The Definitive Guide", chapter 1, section "the Dawn of Big Data", which
> > draw a picture showing HBase store the same column of all different rows
> > continuously physically in storage. Any comments?
> >
> > 2. I want to confirm my understanding is correct -- supposing I have only
> > one column family with 10 columns, the physical storage is row (with all
> > related columns) after row, other than store 1st column of all rows, then
> > store 2nd columns of all rows, etc?
> >
> > 3. It seems when we say column based storage, there are two meanings, (1)
> > column oriented database => en.wikipedia.org/wiki/Column-oriented_DBMS,
> > where the same column of different rows stored together, (2) and column
> > oriented architecture, e.g. how Hbase is designed, which is used to
> > describe the pattern to store sparse, large number of columns (with NULL
> > for free). Any comments?
> >
> > regards,
> > Lin
> >
> > On Mon, Aug 6, 2012 at 12:08 AM, Mohit Anchlia <mohitanchlia@gmail.com
> >wrote:
> >
> >> On Sun, Aug 5, 2012 at 6:04 AM, Lin Ma <linlma@gmail.com> wrote:
> >>
> >> > Hi guys,
> >> >
> >> > I am wondering whether HBase is using column based storage or row
> based
> >> > storage?
> >> >
> >> >    - I read some technical documents and mentioned advantages of
> HBase is
> >> >    using column based storage to store similar data together to foster
> >> >    compression. So it means same columns of different rows are stored
> >> > together;
> >>
> >>
> >> Probably what you read was in context of Column Families. HBase has
> concept
> >> of column family similar to Google's bigtable. And the store files on
> disk
> >> is per column family. All columns of a given column family are in one
> store
> >> file and columns of different column family is a different file.
> >>
> >>
> >> >    - But I also learned HBase is a sorted key-value map in underlying
> >> >    HFile. It uses key to address all related columns for that key
> (row),
> >> > so it
> >> >    seems to be a row based storage?
> >> >
> >> HBase stores entire row together along with columns represented by
> >> KeyValue. This is also called cell in HBase.
> >>
> >>
> >> > It is appreciated if anyone could clarify my confusions. Any related
> >> > documents or code for more details are welcome.
> >> >
> >> > thanks in advance,
> >> >
> >> > Lin
> >> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message