hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4764) add replication factor for hdfs directory
Date Fri, 05 Dec 2008 22:45:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653956#action_12653956

Konstantin Shvachko commented on HADOOP-4764:

Ruyue Ma,
# Currently recursive setReplication() for a directory is handled on the client side. The
name-node can change replication only for individual files. So I was proposing to optimize
that by handling recursive replications on the name-node itself.
# I do not understand your proposal to change only the edits file format. Does that mean fsimage
will not store replication for directories? If so then after second name-node restart you
loose all replications for current directories, right?
# I did not get exactly what is the semantics of directory replication: does it apply to direct
offsprings only or the whole subtree?

Zheng Shao,
Which part of the source code will be responsible for reading the dummy file and retrieving
the default replication? If it is done on the application level, I am fine with it. If the
name-node will have to create an extra file for every directory then I am worried a lot.

Adding additional fields to file/directory inodes or creating additional "system" files increases
the memory footprint of the namespace (will require larger heap size to store the same number
of files). With directories it is not so critical, because there is not so many of them, but
still should be justified.

Based on my estimation of required code changes I should mention that what you propose (introducing
replications for directories) is a new feature, and should go through all the steps required
for new features, which include
- motivation;
- design document;
- implementation (patch);
- test planning, testing, and support.

So if you feel like you have enough bandwidth, enthusiasm, etc to do that, lets start with
the design proposal.

> add replication factor for hdfs directory
> -----------------------------------------
>                 Key: HADOOP-4764
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4764
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Ruyue Ma
>            Assignee: Ruyue Ma
> If we can set replication factor for directory. we can modify the DFSClent.create() method,
pass 0 for the default block replication. Namenode check create request, if blockreplication
is 0, it will give its parent dir replication factor to the file blockreplication factor.
This will simplify the administration work. You know we can set /Test or /Tmp dir's replication
factor 2 or 1, then all their children files and dirs replication factor is 2 or 1 defaultly.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message