hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabor Bota (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-13818) Extend OIV to detect FSImage corruption
Date Fri, 24 Aug 2018 13:05:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591601#comment-16591601
] 

Gabor Bota edited comment on HDFS-13818 at 8/24/18 1:04 PM:
------------------------------------------------------------

Thanks for working on this [~adam.antal]!

First pass review for patch v1:
- On {{PBImageTextWriter:120}} please comment on when {{PBImageTextWriter#getName}} will return
when corruption detected.
- Please describe why, and how do you use {{PBImageTextWriter#putDirChildToMetadataMap}}.
Eg. add a javadoc on top of the method.
- In {{PBImageTextWriter#afterOutput}} describe why the method is empty. Could we change it
to be abstract?
- Can we decide in {{PBImageTextWriter#outputINodes}} if the ignored inode is ignored because
of it's a snapshot or because of corruption? What would be the additional cost for such detection?
- {{PBImageCorruptionDetector#getTypeOfId}} please store the literals in static final string.
- In {{PBImageCorruptionDetector#Corruption#CorruptionType}} please move the second level
inner class to another class, so do not use two levels of inner classes, for better readability.
- {{CorruptionType#setMissingChild}} and {{CorruptionType#setCorruptNode}} could be simple
setters, where a parameter could be passed.
- Maybe you could use a builder pattern in {{PBImageCorruptionDetector#Corruption}} for the
entry and use it in {{PBImageCorruptionDetector#getEntry}} and {{PBImageCorruptionDetector#afterOutput}}
(and other cases when you use it)
- In {{PBImageCorruptionDetector:210}} use {{corruptionsMap}} instead of {{corruption}} variable
name for better readability
- Maybe we know what the parentPath in {{PBImageCorruptionDetector#afterOutput]} on {{PBImageCorruptionDetector:347}}?
What could be a way for getting it?
- Fix the checkstyle issues

Other questions:
 - What is additional memory footprint on this compared to delimited output?
 - What is additional processing (time) footprint?
 - How much better is this solution compared to starting on NameNode for detecting corrupted
fsimage? I'm thinking of time (complexity), memory footprint, and usage.

For test cases:
 - Please provide some unit test for some edge cases for the newly added functionality.
 - Some functional tests: it would be good to prove that all types of corruptions are covered.

For docs:
 - It would be nice to provide a short specification doc on the change (how will it work,
what are the additional steps on top of the current delimited implementation) and also how
could we use this feature and the output of this.
 - There's an [Offline Image Viewer Guide|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html]
which should be extended with the description of this feature.


was (Author: gabor.bota):
Thanks for working on this [~adam.antal]!

First pass review for patch v1:
- On {PBImageTextWriter:120} please comment on when {PBImageTextWriter#getName} will return
when corruption detected.
- Please describe why, and how do you use {PBImageTextWriter#putDirChildToMetadataMap}. Eg.
add a javadoc on top of the method.
- In {PBImageTextWriter#afterOutput} describe why the method is empty. Could we change it
to be abstract?
- Can we decide in {PBImageTextWriter#outputINodes}if the ignored inode is ignored because
of it's a snapshot or because of corruption? What would be the additional cost for such detection?
- {PBImageCorruptionDetector#getTypeOfId}please store the literals in static final string.
- In {PBImageCorruptionDetector#Corruption#CorruptionType}please move the second level inner
class to another class, so do not use two levels of inner classes, for better readability.
- {CorruptionType#setMissingChild} and {CorruptionType#setCorruptNode} could be simple setters,
where a parameter could be passed.
- Maybe you could use a builder pattern in {PBImageCorruptionDetector#Corruption} for the
entry and use it in {PBImageCorruptionDetector#getEntry} and {PBImageCorruptionDetector#afterOutput}
(and other cases when you use it)
- In {PBImageCorruptionDetector:210} use {corruptionsMap} instead of \{corruption} variable
name for better readability
- Maybe we know what the parentPath in {PBImageCorruptionDetector#afterOutput} on {PBImageCorruptionDetector:347}?
What could be a way for getting it?
- Fix the checkstyle issues

Other questions:
 - What is additional memory footprint on this compared to delimited output?
 - What is additional processing (time) footprint?
 - How much better is this solution compared to starting on NameNode for detecting corrupted
fsimage? I'm thinking of time (complexity), memory footprint, and usage.

For test cases:
 - Please provide some unit test for some edge cases for the newly added functionality.
 - Some functional tests: it would be good to prove that all types of corruptions are covered.

For docs:
 - It would be nice to provide a short specification doc on the change (how will it work,
what are the additional steps on top of the current delimited implementation) and also how
could we use this feature and the output of this.
 - There's an [Offline Image Viewer Guide|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html]
which should be extended with the description of this feature.

> Extend OIV to detect FSImage corruption
> ---------------------------------------
>
>                 Key: HDFS-13818
>                 URL: https://issues.apache.org/jira/browse/HDFS-13818
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>            Reporter: Adam Antal
>            Assignee: Adam Antal
>            Priority: Major
>         Attachments: HDFS-13818.001.patch
>
>
> A follow-up Jira for HDFS-13031: an improvement of the OIV is suggested for detecting
corruptions like HDFS-13101 in an offline way.
> The reasoning is the following. Apart from a NN startup throwing the error, there is
nothing in the customer's hand that could reassure him/her that the FSImages is good or corrupted.
> Although real full checking of the FSImage is only possible by the NN, for stack traces
associated with the observed corruption cases the solution of putting up a tertiary NN is
a little bit of overkill. The OIV would be a handy choice, already having functionality
like loading the fsimage and constructing the folder structure, we just have to add the option
of detecting the null INodes. For e.g. the Delimited OIV processor can already use in disk
MetadataMap, which reduces memory consumption. Also there may be a window for parallelizing:
iterating through INodes for e.g. could be done distributed, increasing efficiency, and we
wouldn't need a high mem-high CPU setup for just checking the FSImage.
> The suggestion is to add a --detectCorruption option to the OIV which would check the
FSImage for consistency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message