From "Takanobu Asanuma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7687) Change fsck to support EC files
Date Tue, 14 Apr 2015 04:58:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493560#comment-14493560

Takanobu Asanuma commented on HDFS-7687:

I wrote a simple test code about fsck:
Path ecDirPath = new Path("/striped");
Path ecFilePath = new Path(ecDirPath, "ecfile");
int numBlocks = 4;
DFSTestUtil.createECFile(cluster, ecFilePath, ecDirPath, numBlocks, NUM_STRIPE_PER_BLOCK);
runFsck(conf, 0, true, "/");

The results are here:
 Total size:	12582912 B
 Total dirs:	2
 Total files:	1
 Total symlinks:		0
 Total blocks (validated):	4 (avg. block size 3145728 B)
 Minimally replicated blocks:	4 (100.0 %)
 Over-replicated blocks:	4 (100.0 %)
 Under-replicated blocks:	0 (0.0 %)
 Mis-replicated blocks:		4 (100.0 %)
 Default replication factor:	3
 Average block replication:	9.0
 Corrupt blocks:		0
 Missing replicas:		0 (NaN %)
 Number of data-nodes:		9
 Number of racks:		1
FSCK ended at Tue Apr 14 13:04:16 JST 2015 in 9 milliseconds

The filesystem under path '/' is HEALTHY

>From the results, BlockStripedInfo(which is ec block group) is regarded as Over-replicated
blocks because current fsck is specialized in replication. I think we should separate between
replication and EC. For example,

 Total size:
 Total dirs:
 Total files:
 Total symlinks:
 Number of data-nodes:
 Number of racks:
 Total blocks (validated):
 Minimally replicated blocks:
 Over-replicated blocks:
 Under-replicated blocks:
 Mis-replicated blocks:
 Default replication factor:
 Average block replication:
 Corrupt blocks:
 Missing replicas:
 Total EC block groups (validated):
 Over EC block groups:
 Under EC block groups:
 Mis EC block groups:
 Default EC schema:
 Corrupt EC block groups:
 Missing EC block groups:

How does look that?

> Change fsck to support EC files
> -------------------------------
>                 Key: HDFS-7687
>                 URL: https://issues.apache.org/jira/browse/HDFS-7687
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Takanobu Asanuma
> We need to change fsck so that it can detect "under replicated" and corrupted EC files.

