hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsh J (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-2936) Provide a better way to specify a HDFS-wide minimum replication requirement
Date Mon, 14 May 2012 02:59:48 GMT

     [ https://issues.apache.org/jira/browse/HDFS-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Harsh J updated HDFS-2936:
--------------------------

    Attachment: HDFS-2936.patch

I had this done around submission time but lost my changes as a result of my Mac's disk crashing.
Re-did the separation logic over this weekend.

The patch makes these changes briefly, for those who're interested in reviewing it:
* The role of the current {{dfs.namenode.replication.min}} has been changed to only apply
restrictions at the user levels: file creation replication factors and file replication factor
adjustments.
* A new property, {{dfs.namenode.replication.min.for.write}} applies for all write conditions,
such as adding a block, closing a block, etc.. The former property used to control these layers,
which I've now merely split to be another property in case such hard-guarantees aren't required
for anything beyond user-layer restrictions.
* There were no tests for min-replication, so I added those to TestFileCreation.
* I added a few regression tests for this prop-split to TestReplication.

This patch (is WIP), while it tests the good conditions, still needs a test for the bad/violation
conditions. The last test in TestReplication needs more work before it can work reliably (and
not hang at shutdown). The issue it faces is cause of the fact that the DFSClient.close()
call NEVER exits if it can't close the file for min-replication reasons. It also happily eats
InterruptedException, making it difficult for me to write a test with waitForXSeconds-then-interrupt
conditions. However, I'll find another way and fix that shortly (unless we discuss about limiting
the completeFile retries, which is infinite at the moment and retried every 0.4 seconds).
                
> Provide a better way to specify a HDFS-wide minimum replication requirement
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-2936
>                 URL: https://issues.apache.org/jira/browse/HDFS-2936
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Harsh J
>            Assignee: Harsh J
>         Attachments: HDFS-2936.patch
>
>
> Currently, if an admin would like to enforce a replication factor for all files on his
HDFS, he does not have a way. He may arguably set dfs.replication.min but that is a very hard
guarantee and if the pipeline can't afford that number for some reason/failure, the close()
does not succeed on the file being written and leads to several issues.
> After discussing with Todd, we feel it would make sense to introduce a second config
(which is ${dfs.replication.min} by default) which would act as a minimum specified replication
for files. This is different than dfs.replication.min which also ensures that many replicas
are recorded before completeFile() returns... perhaps something like ${dfs.replication.min.user}.
We can leave dfs.replication.min alone for hard-guarantees and add ${dfs.replication.min.for.block.completion}
which could be left at 1 even if dfs.replication.min is >1, and let files complete normally
but not be of a low replication factor (so can be monitored and accounted-for later).
> I'm prefering the second option myself. Will post a patch with tests soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message