hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-170) setReplication and related bug fixes
Date Wed, 26 Apr 2006 21:31:04 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-170?page=comments#action_12376558 ] 

Doug Cutting commented on HADOOP-170:

We should start setting a high replication count in MapReduce for submitted job files, that
are read by every node.  But how should we use this?  Can we simply do something like setReplication(Integer.MAX_VALUE)
and have a file replicated as high as the fs thinks is useful (once per node, once per rack,
once per nodes/10, whichever it chooses).

> setReplication and related bug fixes
> ------------------------------------
>          Key: HADOOP-170
>          URL: http://issues.apache.org/jira/browse/HADOOP-170
>      Project: Hadoop
>         Type: Improvement

>   Components: fs, dfs
>     Versions: 0.1.1
>     Reporter: Konstantin Shvachko
>     Assignee: Konstantin Shvachko
>  Attachments: setReplication.patch
> Having variable replication (HADOOP-51) it is natural to be able to
> change replication for existing files. This patch introduces the functionality.
> Here is a detailed list of issues addressed by the patch.
> 1) setReplication() and getReplication() methods are implemented.
> 2) DFSShell prints file replication for any listed file.
> 3) Bug fix. FSDirectory.delete() logs delete operation even if it is not successful.
> 4) Bug fix. This is a distributed bug.
> Suppose that file replication is 3, and a client reduces it to 1.
> Two data nodes will be chosen to remove their copies, and will do that.
> After a while they will report to the name node that the copies have been actually deleted.
> Until they report the name node assumes the copies still exist.
> Now the client decides to increase replication back to 3 BEFORE the data nodes
> reported the copies are deleted. Then the name node can choose one of the data nodes,
> which it thinks have a block copy, to replicate the block to new data nodes.
> This setting is quite unusual but possible even without variable replications.
> 5) Logging for name and data nodes is improved in several cases.
> E.g. data nodes never logged that they deleted a block.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message