hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1435) Provide an option to store fsimage compressed
Date Tue, 05 Oct 2010 21:41:37 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918222#action_12918222

Hairong Kuang commented on HDFS-1435:

LZO compression codec is not supported in Hadoop standard package. So the compression algorithm
has to be configurable.

If we compress the entire image file, the challenge is to decide where to put the compression
algorithm information.
Dhruba suggested to store this information in file VERSION. This idea is neat. The only problem
is that now saving the fsimage needs to touch two files and its hard to guarantee atomicity.

Another solution is to use a suffix to the image file name to indicate the compression algorithm.
The problem with this is that now the image file no longer has a unique name so it is possible
one storage directory has multiple fsimages. How do we handle this?

After discussions back and forth, I am kind of thinking to use the approach that I originally
proposed, changing the binary format. Therefore we could store the compression algorithm information
in the fsimage header. In this way, we don't need to deal with any of the complexity that
compressing the entire image file presents.

What do the community think?

> Provide an option to store fsimage compressed
> ---------------------------------------------
>                 Key: HDFS-1435
>                 URL: https://issues.apache.org/jira/browse/HDFS-1435
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0
> Our HDFS has fsimage as big as 20G bytes. It consumes a lot of network bandwidth when
secondary NN uploads a new fsimage to primary NN.
> If we could store fsimage compressed, the problem could be greatly alleviated.
> I plan to provide a new configuration hdfs.image.compressed with a default value of false.
If it is set to be true, fsimage is stored as compressed.
> The fsimage will have a new layout with a new field "compressed" in its header, indicating
if the namespace is stored compressed or not.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message