hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Pendleton (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-51) per-file replication counts
Date Sat, 08 Apr 2006 19:16:13 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-51?page=comments#action_12373745 ] 

Bryan Pendleton commented on HADOOP-51:
---------------------------------------

Great!

A few comments from reading the patch (haven't test with it yet):
1) The <description> for dfs.replication.min is wrong
2) This is a wider concern, but on coding style - the idiom of conf.getType("config.value",defaultValue)
is good for user-defined values, but shouldn't the default be skipped for things that are
defined in hadoop-default.xml, in general? It takes away the value of hadoop-default, and
it also means changing that value might or might not always have the desired system-wide results.
3) Wouldn't it be better to log at a severe level replications that are set below minReplication,
or greater than maxReplication, and just set the replication to the nearest bound? Since replication
is set per-file by the application, but min and max are probably set by the administrator
of the hadoop cluster. Throwing an IOException causes failure where degraded performance would
be preferable.
4) I may be dense, but I didn't see any way to specify that replication be "full", ie, a copy
per datanode. I got the feeling this was something that was desired of this functionality
(ie, for job.jar files, job configs, and lookup data used widely in a job) Using a short means,
if we ever scale to > 32k nodes, there'd be no way to manually specify this. Just using
Short.MAX_VALUE means getting a lot of errors about not being able to replicate as fully as
desired.

Otherwise, this looks like a wonderful patch!

> per-file replication counts
> ---------------------------
>
>          Key: HADOOP-51
>          URL: http://issues.apache.org/jira/browse/HADOOP-51
>      Project: Hadoop
>         Type: New Feature

>   Components: dfs
>     Versions: 0.2
>     Reporter: Doug Cutting
>     Assignee: Konstantin Shvachko
>      Fix For: 0.2
>  Attachments: Replication.patch
>
> It should be possible to specify different replication counts for different files.  Perhaps
an option when creating a new file should be the desired replication count.  MapReduce should
take advantage of this feature so that job.xml and job.jar files, which are frequently accessed
by lots of machines, are more highly replicated than large data files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message