hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9231) fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot
Date Fri, 23 Oct 2015 17:56:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971469#comment-14971469
] 

Xiao Chen commented on HDFS-9231:
---------------------------------

Thanks a lot for the review [~yzhangal]!
{quote}
1. The description is not quite accurate per our discussion, suggest to modify. Especially
the patch actually does change (and fix) the behavior when without -includeSnapshots.
{quote}
It was great to talk to you. I have updated the description. Modified patch summary in the
end of this comment.
{quote}
2. A possible optimization in FSDirSnapshotOp#getSnapshotFiles. It seems that the sf variable
could be calculated in caller for once before the loop in the caller, and pass to this method.
{quote}
My apologies for the confusion, I added some comments in this method. But getting sf for each
snapshottable dir is needed, since /d1 and /d2 have different snapshotlist.
{quote}
3. final INodesInPath iip = fsd.getINodesInPath4Write(snap, false); maybe substituted with
call to getINodesInPath
{quote}
Good catch! I updated the code to call {{getINode}} which invokes {{getINodesInPath}}.
{quote}
4. The check if (!corruptFileBlocks.isEmpty()) in listCorruptFileBlocksWithSnapshot is not
needed
{quote}
Good call. Fixed.
{quote}
5. Add comment in listCorruptFileBlocks() before the call namenode.getNamesystem().listCorruptFileBlocksWithSnapshot,
to indicate that snapshottableDirs is only relevant when -includeSnapshots is specified.
{quote}
Added a link to {{FSNamesystem#listCorruptFileBlocksWithSnapshot}} which explains that parameter
in javadoc.
{quote}
6. In listCorruptFileBlocksWithSnapshot, we can add
{code}
if (snapshottableDirs == null) {
  continue;
}
{code}
to avoid the call to getSnapshotFiles.
{quote}
I'm not sure this is necessary. On one hand, it definitely saves 1 call stack. On the other
hand, with the existence of all those loops and checks, I think the performance gain of saving
1 call stack would be trivial.  And the nullity check of snapshottableDirs is already performed
as a first step in {{getSnapshotFiles}}.

Attached patch 005 with the above modifications. Updated summary below:
- {{fsck -list-corruptfileblocks -includeSnapshots}} will also show full dir of snapshots
- {{fsck -list-corruptfileblocks}} without -includeSnapshots will not show corrupt blocks
that only have snapshot files
- NameNode WebUI's way of showing corrupted files/blocks unchanged.
- Added a sentence in NN WebUI to hint the admin to run fsck with -includeSnapshots, if there're
snapshots present in the system.
- Some refactoring to reuse existing code in new methods getSnapshottableDirs and ListCorruptFileBlocksWithSnapshot
- The reasoning of keep minimal change to NN WebUI and fsck without -includeSnapshots is that
getting all possible snapshots may be slow.

> fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot
> -----------------------------------------------------------------------
>
>                 Key: HDFS-9231
>                 URL: https://issues.apache.org/jira/browse/HDFS-9231
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: snapshots
>            Reporter: Xiao Chen
>            Assignee: Xiao Chen
>         Attachments: HDFS-9231.001.patch, HDFS-9231.002.patch, HDFS-9231.003.patch, HDFS-9231.004.patch,
HDFS-9231.005.patch
>
>
> Currently for snapshot files, {{fsck -list-corruptfileblocks}} shows corrupt blocks with
the original file dir instead of the snapshot dir, and {{fsck -list-corruptfileblocks -includeSnapshots}}
behave the same.
> This can be confusing because even when the original file is deleted, fsck will still
show that deleted file as corrupted, although what's actually corrupted is the snapshot. 
> As a side note, {{fsck -files -includeSnapshots}} shows the snapshot dirs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message