hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-6084) Distinguishing file missing/corruption for low replication files
Date Fri, 19 Jun 2009 01:29:07 GMT
Distinguishing file missing/corruption for low replication files

                 Key: HADOOP-6084
                 URL: https://issues.apache.org/jira/browse/HADOOP-6084
             Project: Hadoop Core
          Issue Type: Improvement
          Components: dfs
            Reporter: Koji Noguchi

In PIG-856, there's a discussion about reducing the replication factor for intermediate files
between jobs.
I've seen users doing the same in mapreduce jobs getting some speed up. (I believe their outputs
were too small to benefit from the pipelining.)

Problem is, when users start changing replications to 1 (or 2), ops starts seeing alerts from
fsck and HADOOP-4103 even with a single datanode failure.
Also, problem of Namenode not getting out of safemode when restarted.

My answer has been asking the users  "please don't change the replication less than 3".
But is this the right approach?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message