hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sirianni, Eric" <Eric.Siria...@netapp.com>
Subject mapred replication
Date Thu, 15 Aug 2013 13:21:13 GMT
In debugging some replication issues in our HDFS environment, I noticed that the MapReduce
framework uses the following algorithm for setting the replication on submitted job files:

1.     Create the file with *default* DFS replication factor (i.e. 'dfs.replication')

2.     Subsequently alter the replication of the file based on the 'mapred.submit.replication'
config value

  private static FSDataOutputStream createFile(FileSystem fs, Path splitFile,
      Configuration job)  throws IOException {
    FSDataOutputStream out = FileSystem.create(fs, splitFile,
        new FsPermission(JobSubmissionFiles.JOB_FILE_PERMISSION));
    int replication = job.getInt("mapred.submit.replication", 10);
    fs.setReplication(splitFile, (short)replication);
    writeSplitHeader(out);
    return out;
  }

If I understand currectly, the net functional effect of this approach is that

-       The initial write pipeline is setup with 'dfs.replication' nodes (i.e. 3)

-       The namenode triggers additional inter-datanode replications in the background (as
it detects the blocks as "under-replicated").

I'm assuming this is intentional?  Alternatively, if the mapred.submit.replication was specified
on initial create, the write pipeline would be significantly larger.

The reason I noticed is that we had inadvertently specified mapred.submit.replication as *less
than* dfs.replication in our configuration, which caused a bunch of excess replica pruning
(and ultimately IOExceptions in our datanode logs).

Thanks,
Eric


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message