hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed
Date Mon, 04 Oct 2010 21:22:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917768#action_12917768

Hairong Kuang commented on HDFS-1435:

I thought more about Philip's suggestion. So instead of changing the fsimage format, I have
an option simply compress the whole image file and then when loading the fsimage, it decompress
the image if the image file ends with a "compression" suffix.

This has a couple of advantages over my original idea:
1. No need to change layout versions;
2. Give admin more flexibility to use existing tools to compress fsimage even if HDFS is not
configured to compress fsimage.

I also did a few experiments with different compression algorithms. I tried both gzip and
LZO with a 13G fsimage, both using the default level of compression.
Gzip used 13 minutes to compress the 13G fsimage to be 2.3G bytes and decompression used 2
minutes 47 seconds.
LZO used only 3 minutes to compress the 13G fsimage to be 3G bytes and decompression used
2 minutes 51 seconds.

This is very promising results. I think fsimage has a lot of duplicate bytes so it could compress
really well. And also it is very obvious that LZO provides good compression speed and good
enough compression quality.

> Provide an option to store fsimage compressed
> ---------------------------------------------
>                 Key: HDFS-1435
>                 URL: https://issues.apache.org/jira/browse/HDFS-1435
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0
> Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network bandwidth when
secondary NN uploads a new fsimage to primary NN.
> If we could store fsimage compressed, the problem could be greatly alleviated.
> I plan to provide a new configuration hdfs.image.compressed with a default value of false.
If it is set to be true, fsimage is stored as compressed.
> The fsimage will have a new layout with a new field "compressed" in its header, indicating
if the namespace is stored compressed or not.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message