hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-2540) Empty blocks make fsck report corrupt, even when it isn't
Date Tue, 08 Jan 2008 00:30:34 GMT
Empty blocks make fsck report corrupt, even when it isn't

                 Key: HADOOP-2540
                 URL: https://issues.apache.org/jira/browse/HADOOP-2540
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.15.1
            Reporter: Allen Wittenauer

If the name node crashes after blocks have been allocated and before the content has been
uploaded, fsck will report the zero sized files as corrupt upon restart:

/user/rajive/rand0/_task_200712121358_0001_m_000808_0/part-00808: MISSING 1 blocks of total
size 0 B

... even though all blocks are accounted for:

 Total size:    2932802658847 B
 Total blocks:  26603 (avg. block size 110243305 B)
 Total dirs:    419
 Total files:   5031
 Over-replicated blocks:        197 (0.740518 %)
 Under-replicated blocks:       0 (0.0 %)
 Target replication factor:     3
 Real replication factor:       3.0074053

The filesystem under path '/' is CORRUPT

In UFS and related filesystems, such files would get put into lost+found after an fsck and
the filesystem would return back to normal.  It would be super if HDFS could do a similar
thing.  Perhaps if all of the nodes stored in the name node's 'includes' file have reported
in, HDFS could automatically run a fsck and store these not-necessarily-broken files in something
like lost+found.  

Files that are actually missing blocks, however, should not be touched.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message