hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manoj Govindassamy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11220) SnapshotDiffReport should detect open files in HDFS Snapshots
Date Fri, 17 Feb 2017 19:04:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872315#comment-15872315

Manoj Govindassamy commented on HDFS-11220:

HDFS-11402 - HDFS Snapshots should capture point-in-time copies of OPEN files can help solve
this issue as well. Will add more tests and cases as part of this bug once HDFS-11402 is resolved.

> SnapshotDiffReport should detect open files in HDFS Snapshots
> -------------------------------------------------------------
>                 Key: HDFS-11220
>                 URL: https://issues.apache.org/jira/browse/HDFS-11220
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: snapshots
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
> *Problem:*
> 1. When there are files being written and when HDFS Snapshots are taken in parallel,
Snapshots do capture all these files, but these being written files in Snapshots do not have
the point-in-time file length captured. Most of the times, these open files will have a length
of 0, or the last block boundary size.
> 2. Only at the time of File close or any other meta data modification operation on these
files, HDFS reconciles the file length and records the modification in the last taken Snapshot.
All the previously taken Snapshots continue to have those open Files with no modification
recorded. So, all those previous snapshots end up using the final modification record in the
next available snapshot. So, after the file close, file lengths in all those snapshots will
end up same.
> Assume File1 is opened for write and a total of 1MB written to it. While the writes are
happening, snapshots are taken in parallel.
> {noformat}
> |---Time---T1-----------T2-------------T3----------------T4------>
> |-----------------------Snap1----------Snap2-------------Snap3--->
> |---File1.open---write---------write-----------close------------->
> {noformat}
> Then at time,
> T2:
> Snap1.File1.length = 0
> T3:
> Snap1.File1.length = 0
> Snap2.File1.length = 0
> <File1 write completed and closed>
> T4:
> Snap1.File1.length = 1MB
> Snap2.File1.length = 1MB
> Snap3.File1.length = 1MB
> So, Snapshot Diff Report running against any of above snapshots will not detect any delta
changes in the open files. 
> *Proposal:*
> 1. HDFS Snapshots can stash open file details in the snapshot record. 
> 2. NameNode might not have the accurate byte level length visibility on the open files,
Snapshots might not have the accurate point-in-time length captured. So, SnapshotDiffReport
can have an option to detect open files and always show {{M}} flag for the open files, if
the files are available on both the snapshots it is running against with. 
> {noformat}
> hdfs snapshotDiff -includeOpenFiles <snapDir> <snapName> <snapName>
> {noformat}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message