hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop Sam John <anoo...@huawei.com>
Subject RE: HBase table disk usage
Date Wed, 04 Jul 2012 03:28:50 GMT

The KV storage will be like

KeyLength (4 bytes) + Value length(4 bytes) + rowkeylength(2bytes) + rowkey(.. bytes) + CF
length(1 byte) + CF (...bytes) + Qualifier(..bytes) + timestamp(8 bytes) + type(1 byte) +
value (...bytes)

If you are using HFile V2 there will be memstoreTS also added with every KV. This will be
1 to 4 bytes long. (Mostly 1 byte as the value will be reset to 0 during compaction)

Now calculate whether the size u found is matching with the expected.

If you are using version 94, there is block encoding feature in which most of these extra
bytes other than key and value can be encoded to smaller size.

From: Sever Fundatureanu [fundatureanu.sever@gmail.com]
Sent: Tuesday, July 03, 2012 8:36 PM
To: user@hbase.apache.org
Subject: Re: HBase table disk usage

I was only du'ing the table dir. The tmp dirs only had a couple of hundred
bytes in my case.
The HFile tool only gives the avgKeyLen=46. This does not include 4 bytes
KeyLength + 4 bytes ValueLength.
Now indeed I get a total of 54 bytes/KV *1.5 billion ~= 81GB. Probably
there are also leftovers from HDFS blocks not being fully occupied.


On Tue, Jul 3, 2012 at 2:29 PM, Stack <stack@duboce.net> wrote:

> On Tue, Jul 3, 2012 at 2:17 PM, Sever Fundatureanu
> <fundatureanu.sever@gmail.com> wrote:
> > Right, forgot about the timestamps. These should be a long value each,
> so 8
> > bytes. The versioning is set to 1 so it shouldn't count.
> > Note the column qualifier is also void on each entry.
> >
> > So now we get (33+1+8)x1.5*10^9 = 63GB, still a 19GB difference...
> >
> What about regionserver WAL logs?  You including these in your math or
> are you just du'ing the table dir?  The table dir can have tmp dirs
> for compaction and split work.  And after Michael Segel, the KV has a
> type byte as well as some lengths for finding offsets in KV; take a
> looksee w/ the hfile tool:
> http://hbase.apache.org/book.html#hfile_tool2
> St.Ack

Sever Fundatureanu

Vrije Universiteit Amsterdam
E-mail: fundatureanu.sever@gmail.com
View raw message