hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianwu Wang <jia...@sdsc.edu>
Subject Re: how to get precious data size in hbase?
Date Wed, 24 Aug 2011 18:29:23 GMT
Hi Lars,

     Thanks for your info. Our data is dense and no compression is used.

     We saw a blog on HBASE architecture at 
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html. 
It looks |'hbase org.apache.hadoop.hbase.io.hfile.HFile|' can provide 
more detailed info for each HFile and it has info like 
'totalBytes=84055'. The totalBytes value is smaller than the value 
gotten by "hadoop fs -dus" (84447 in the example). We are still trying 
to understand what these values really mean.


On 8/22/11 10:46 PM, lars hofhansl wrote:
> Hi Jianwu,
>
>
> Are you using compression?
> Is your data sparse or dense? (I.e. for a typical row key, do all or most columns in
your "schema" have values, or only a few)?
>
>
> With HBase you need to keep in mind that each value is tagged with (rowkey, column family
name, column value, timestamp).
> That allows it to store data in a sparse way, but also means that each value comes with
a lot of baggage.
>
>
> I've heard somewhere that a 3T Oracle database expanded to 28T in HBase without compression
and to about 5T with GZ compression.
> That is just an anecdote, though, and probably stems from the fact that each column in
Oracle was transferred to HBase, even empty (null) ones.
>
>
> -- Lars
>
>
>
> ________________________________
> From: Jianwu Wang<jianwu@sdsc.edu>
> To: user@hbase.apache.org
> Sent: Monday, August 22, 2011 5:36 PM
> Subject: how to get precious data size in hbase?
>
> Hi there,
>
>      We have some data saved in hbase on HDFS. We know using the following command can
get the file size of each hbase table: hadoop fs -dus /hbase/tableName.
>
>      For mysql, we can get exact data size for each table using sql queries displayed
on http://www.mkyong.com/mysql/how-to-calculate-the-mysql-database-size/. We can also get
file disk size using command like: du -s /path/to/datafile. Yet the data size gotten using
sql query is quite smaller than the file disk size gotten using du -s. We think the above
hadoop command also get file disk size, not the data size in database. So we are wondering
whether there is a way like msql query running on hbase shell to get the data size in Hbase.
 Thanks a lot!
>


-- 

Best wishes

Sincerely yours

Jianwu Wang
jianwu@sdsc.edu


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message