hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks
Date Fri, 20 Oct 2017 21:27:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213269#comment-16213269
] 

Xiao Chen commented on HDFS-12618:
----------------------------------

Thanks for reporting and working on this [~wchevreuil]!

I agree this is an issue, but not sure what the best solution would be - this appears to be
a difficult problem. Let me research into this too see if any other ideas pop up.

The reason webui shows correctly is it's just showing the total number of blocks from the
block manager. I'm afraid we don't have information as to 'Total blocks under to a directory',
so not useful for fsck.

Some issues I can see from the current patch:
- {{snapshotSeenBlocks}} could be huge, if the snapshot dir is at the top level, and has a
lot of blocks under it. In extreme cases, this may put pressure on NN. At the minimum we should
make sure the extra space is allocated only when {{-includeSnapshots}} is set.
- where inodes are involved, file system locks are required. This adds burden to the NN which
should be minimized.
- Catching a {{RunTimeException}} and re-throw looks like a bad practice. What are the RTEs
that could be thrown that we need to wrap?



> fsck -includeSnapshots reports wrong amount of total blocks
> -----------------------------------------------------------
>
>                 Key: HDFS-12618
>                 URL: https://issues.apache.org/jira/browse/HDFS-12618
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 3.0.0-alpha3
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Minor
>         Attachments: HDFS-121618.initial, HDFS-12618.001.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a snapshot, *fsck*
will not reported blocks for such file, showing different number of *total blocks* than what
is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The problem is that
*-includeSnapshots* option causes *fsck* to count blocks for every occurrence of a file on
snapshots, which is wrong because these blocks should be counted only once (for instance,
if a 100MB file is present on 3 snapshots, it would still map to one block only in hdfs).
This causes fsck to report much more blocks than what actually exist in hdfs and is reported
in the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup          0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup          0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup          0 2017-10-07 21:21 /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup          0 2017-10-07 21:21 /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the normal file
path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 15:15:36
BST 2017
> Status: HEALTHY
>  Number of data-nodes:	1
>  Number of racks:		1
>  Total dirs:			6
>  Total symlinks:		0
> Replicated Blocks:
>  Total size:	1258291200 B
>  Total files:	6
>  Total blocks (validated):	12 (avg. block size 104857600 B)
>  Minimally replicated blocks:	12 (100.0 %)
>  Over-replicated blocks:	0 (0.0 %)
>  Under-replicated blocks:	0 (0.0 %)
>  Mis-replicated blocks:		0 (0.0 %)
>  Default replication factor:	1
>  Average block replication:	1.0
>  Missing blocks:		0
>  Corrupt blocks:		0
>  Missing replicas:		0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message