hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "M. C. Srivas" <mcsri...@gmail.com>
Subject Re: Disk Seeks and Column families
Date Sun, 22 Jan 2012 06:32:23 GMT
Praveen,

 basically you are correct on all counts. If there are too many columns,
 HBase will have to issue more disk-seeks  to extract only the particular
columns you need ... and since the data is laid out horizontally there are
fewer common substrings in a single HBase-block and compression quality
starts to degrade due to reduced redundancy.


On Sat, Jan 21, 2012 at 9:49 AM, Praveen Sripati
<praveensripati@gmail.com>wrote:

> Thanks for the response.
>
> > The contents of a row stay together like a regular row-oriented database.
>
> > K: row-550/colfam1:50/1309813948188/Put/vlen=2 V: 50
> > K: row-550/colfam1:50/1309812287166/Put/vlen=2 V: 50
> > K: row-551/colfam1:51/1309813948222/Put/vlen=2 V: 51
> > K: row-551/colfam1:51/1309812287200/Put/vlen=2 V: 51
> > K: row-552/colfam1:52/1309813948256/Put/vlen=2 V: 52
>
> Is the above statement true for a HFile?
>
> Also from the above example, the data for the column family qualifier are
> not adjacent to take advantage of compression (
> http://en.wikipedia.org/wiki/Column-oriented_DBMS#Compression). Is this a
> proper statement?
>
> Regards,
> Praveen
>
> On Sat, Jan 21, 2012 at 9:03 PM, <yuzhihong@gmail.com> wrote:
>
> > Have you considered using AggregationProtocol to perform aggregation ?
> >
> > Thanks
> >
> >
> >
> > On Jan 20, 2012, at 11:08 PM, Praveen Sripati <praveensripati@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > 1) According to the this url (1), HBase performs well for two or three
> > > column families. Why is it so?
> > >
> > > 2) Dump of a HFile, looks like below. The contents of a row stay
> together
> > > like a regular row-oriented database. If the column family has 100
> column
> > > family qualifiers and is dense then the data for a particular column
> > family
> > > qualifier is spread wide. If I want to do an aggregation on a
> particular
> > > column identifier, the disk seeks doesn't seems to be much better than
> a
> > > regular row-oriented database.
> > >
> > > Please correct me if I am wrong.
> > >
> > > K: row-550/colfam1:50/1309813948188/Put/vlen=2 V: 50
> > > K: row-550/colfam1:50/1309812287166/Put/vlen=2 V: 50
> > > K: row-551/colfam1:51/1309813948222/Put/vlen=2 V: 51
> > > K: row-551/colfam1:51/1309812287200/Put/vlen=2 V: 51
> > > K: row-552/colfam1:52/1309813948256/Put/vlen=2 V: 52
> > >
> > > (1) - http://hbase.apache.org/book/number.of.cfs.html
> > >
> > > Thanks,
> > > Praveen
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message