hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Chansler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-268) Distinguishing file missing/corruption for low replication files
Date Fri, 03 Jul 2009 00:31:47 GMT

    [ https://issues.apache.org/jira/browse/HDFS-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726750#action_12726750

Robert Chansler commented on HDFS-268:

Does anybody have robust evidence that making the replication factor small (or large, that
happens, too!) really helps for an example that is neither contrived nor too small to justify
optimizing the system?

I'd be content not to allow small RF. But if the will of the user is respected, life is too
short for system administrators to be interrupted whenever a block is lost from a file with
small RF. Diagnostic and monitoring tools should quickly dismiss alerts for such lost blocks.

Maybe the system should just heal itself. Delete the broken file and get on with life. Or
truncate it to the last good block. Or append "You lose!" to the file name.

> Distinguishing file missing/corruption for low replication files
> ----------------------------------------------------------------
>                 Key: HDFS-268
>                 URL: https://issues.apache.org/jira/browse/HDFS-268
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Koji Noguchi
> In PIG-856, there's a discussion about reducing the replication factor for intermediate
files between jobs.
> I've seen users doing the same in mapreduce jobs getting some speed up. (I believe their
outputs were too small to benefit from the pipelining.)
> Problem is, when users start changing replications to 1 (or 2), ops starts seeing alerts
from fsck and HADOOP-4103 even with a single datanode failure.
> Also, problem of Namenode not getting out of safemode when restarted.
> My answer has been asking the users  "please don't change the replication less than 3".
> But is this the right approach?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message