hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manoj Govindassamy (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-11220) SnapshotDiffReport should detect open files in HDFS Snapshots
Date Wed, 07 Dec 2016 20:09:58 GMT
Manoj Govindassamy created HDFS-11220:
-----------------------------------------

             Summary: SnapshotDiffReport should detect open files in HDFS Snapshots
                 Key: HDFS-11220
                 URL: https://issues.apache.org/jira/browse/HDFS-11220
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: snapshots
    Affects Versions: 3.0.0-alpha1
            Reporter: Manoj Govindassamy
            Assignee: Manoj Govindassamy


*Problem:*

1. When there are files being written and when HDFS Snapshots are taken in parallel, Snapshots
do capture all these files, but these being written files in Snapshots do not have the point-in-time
file length captured. Most of the times, these open files will have a length of 0, or the
last block boundary size.

2. Only at the time of File close or any other meta data modification operation on these files,
HDFS reconciles the file length and records the modification in the last taken Snapshot. All
the previously taken Snapshots continue to have those open Files with no modification recorded.
So, all those previous snapshots end up using the final modification record in the next available
snapshot. So, after the file close, file lengths in all those snapshots will end up same.

Assume File1 is opened for write and a total of 1MB written to it. While the writes are happening,
snapshots are taken in parallel.

{noformat}
|---Time---T1-----------T2-------------T3----------------T4------>
|-----------------------Snap1----------Snap2-------------Snap3--->
|---File1.open---write---------write-----------close------------->
{noformat}

Then at time,
T2:
Snap1.File1.length = 0

T3:
Snap1.File1.length = 0
Snap2.File1.length = 0

<File1 write completed and closed>

T4:
Snap1.File1.length = 1MB
Snap2.File1.length = 1MB
Snap3.File1.length = 1MB

So, Snapshot Diff Report running against any of above snapshots will not detect any delta
changes in the open files. 

*Proposal:*

1. HDFS Snapshots can stash open file details in the snapshot record. 
2. NameNode might not have the accurate byte level length visibility on the open files, Snapshots
might not have the accurate point-in-time length captured. So, SnapshotDiffReport can always
show {{M}} flag for the open files, if the files are available on both the snapshots it is
running against with. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message