hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wellington Chevreuil (JIRA)" <j...@apache.org>
Subject [jira] [Work started] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks
Date Tue, 17 Oct 2017 12:31:00 GMT

     [ https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Work on HDFS-12618 started by Wellington Chevreuil.
---------------------------------------------------
> fsck -includeSnapshots reports wrong amount of total blocks
> -----------------------------------------------------------
>
>                 Key: HDFS-12618
>                 URL: https://issues.apache.org/jira/browse/HDFS-12618
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 3.0.0-alpha3
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Minor
>         Attachments: HDFS-121618.001.patch, HDFS-121618.initial
>
>
> When snapshot is enabled, if a file is deleted but is contained by a snapshot, *fsck*
will not reported blocks for such file, showing different number of *total blocks* than what
is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The problem is that
*-includeSnapshots* option causes *fsck* to count blocks for every occurrence of a file on
snapshots, which is wrong because these blocks should be counted only once (for instance,
if a 100MB file is present on 3 snapshots, it would still map to one block only in hdfs).
This causes fsck to report much more blocks than what actually exist in hdfs and is reported
in the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup          0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup          0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup          0 2017-10-07 21:21 /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup          0 2017-10-07 21:21 /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the normal file
path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 15:15:36
BST 2017
> Status: HEALTHY
>  Number of data-nodes:	1
>  Number of racks:		1
>  Total dirs:			6
>  Total symlinks:		0
> Replicated Blocks:
>  Total size:	1258291200 B
>  Total files:	6
>  Total blocks (validated):	12 (avg. block size 104857600 B)
>  Minimally replicated blocks:	12 (100.0 %)
>  Over-replicated blocks:	0 (0.0 %)
>  Under-replicated blocks:	0 (0.0 %)
>  Mis-replicated blocks:		0 (0.0 %)
>  Default replication factor:	1
>  Average block replication:	1.0
>  Missing blocks:		0
>  Corrupt blocks:		0
>  Missing replicas:		0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message