incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <>
Subject Re: Disk space sizing for HDFS/Chukwa
Date Thu, 08 Apr 2010 01:43:01 GMT
I don't have exact number because I don't mirror the data from HDFS to
mysql.  There are only selected data that are copied to mysql.  If you have
compression turn on like lzo, the data on HDFS is actually 100 times smaller
than the data inside.  This is because there are a lot of repeated data.
However, Chukwa is treating everything as strings, it's 8 times bigger than
the data that is packed into mysql.  Hence, the math representation is:

HDFS_Size * 100 / 8 = 12.5 times bigger than on HDFS.

If compression is on, otherwise, it's 8 times smaller.
Hope this helps.


On 4/7/10 2:56 PM, "Kirk True" <> wrote:

> Hi all,
> It's my understanding (based on the image in
> that the
> structured data lives in HDFS forever.
> When data is migrated from HDFS to MySQL for use in HICC, how does the
> MySQL disk usage compare to the HDFS disk usage? That is, if I'm using
> 10 TB of data to store my data in HDFS, what will it be when it moves
> over to HDFS? Is it 2x, 10x, or ???
> I'm getting requests for disk size estimates from the IT guys handling
> our staging area and I'm not really sure how to gauge disk usage for MySQL.
> Thanks,
> Kirk

View raw message