hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11402) HDFS Snapshots should capture point-in-time copies of OPEN files
Date Tue, 28 Mar 2017 23:18:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946165#comment-15946165
] 

Yongjun Zhang commented on HDFS-11402:
--------------------------------------

Hi [~manojg],

Thanks a lot for your work here and very sorry for my delayed review.

The patch looks largely good to me. Below are some comments, mostly cosmetic. 

1. We can put the parameters leaseManager and freezeOpenFiles together at the API, since they
are used together for an optional feature. For example in INodeDirectory
{code}
public Snapshot addSnapshot(final LeaseManager leaseManager,
      int id, String name, boolean freezeOpenFiles)
{code}
we can change it to
{code}
public Snapshot addSnapshot(int id, String name,
      final LeaseManager leaseManager,
      final boolean freezeOpenFiles)
{code}
2. share common code in the two {{INodesInPath$fromINode}} methods
3. change method name {{isAncestor}} to {{isDescendent}} in {{INodesInPath}}
4. In {{LeaseManager}}, 
* INODE_FILTER_WORKER_COUNT is only used in a single method, it's better not to define it
as public, and maybe we can just move it to the single method.
* change {{getINodeWithLeases(final INodeDirectory restrictFilesFromDir)}}
 to {{getINodesWithLease(final INodeDirectory ancestorDir)}}
and javadoc the behavior when ancestorDir is null or not-null
* optionally, possibly just use the above COUNT as a cap, and have a way to dynamically decide
how big the thread pool is, especially when the number of files open for write is small. 
This can be consider in the future when needed.
* add a private method (like {{getINodesInLease}} to wrap 
{code}
   synchronized (this) {
      inodes = new INode[leasesById.size()];
      for (long inodeId : leasesById.keySet()) {
        inodes[inodeCount] = fsnamesystem.getFSDirectory().getInode(inodeId);
        inodeCount++;
      }
    }
{code}

5. In hdfs-default.xml, add a note to describe that the file length captured in snapshot for
a file is what's recorded in NameNode, it may be shorter than what the client has written.
In order to capture the length the client has written, the client need to call hflush/hsync
on the file.
6. Suggest to add a test about snapshot diff.

HI [~jingzhao], wonder if you could help doing a review too. Much appreciated.

Thanks.





> HDFS Snapshots should capture point-in-time copies of OPEN files
> ----------------------------------------------------------------
>
>                 Key: HDFS-11402
>                 URL: https://issues.apache.org/jira/browse/HDFS-11402
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 2.6.0
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>         Attachments: HDFS-11402.01.patch, HDFS-11402.02.patch
>
>
> *Problem:*
> 1. When there are files being written and when HDFS Snapshots are taken in parallel,
Snapshots do capture all these files, but these being written files in Snapshots do not have
the point-in-time file length captured. That is, these open files are not frozen in HDFS Snapshots.
These open files grow/shrink in length, just like the original file, even after the snapshot
time.
> 2. At the time of File close or any other meta data modification operation on these files,
HDFS reconciles the file length and records the modification in the last taken Snapshot. All
the previously taken Snapshots continue to have those open Files with no modification recorded.
So, all those previous snapshots end up using the final modification record in the last snapshot.
Thus after the file close, file lengths in all those snapshots will end up same.
> Assume File1 is opened for write and a total of 1MB written to it. While the writes are
happening, snapshots are taken in parallel.
> {noformat}
> |---Time---T1-----------T2-------------T3----------------T4------>
> |-----------------------Snap1----------Snap2-------------Snap3--->
> |---File1.open---write---------write-----------close------------->
> {noformat}
> Then at time,
> T2:
> Snap1.File1.length = 0
> T3:
> Snap1.File1.length = 0
> Snap2.File1.length = 0
> <File1 write completed and closed>
> T4:
> Snap1.File1.length = 1MB
> Snap2.File1.length = 1MB
> Snap3.File1.length = 1MB
> *Proposal*
> 1. At the time of taking Snapshot, {{SnapshotManager#createSnapshot}} can optionally
request {{DirectorySnapshottableFeature#addSnapshot}} to freeze open files. 
> 2. {{DirectorySnapshottableFeature#addSnapshot}} can consult with {{LeaseManager}} and
get a list INodesInPath for all open files under the snapshot dir. 
> 3. {{DirectorySnapshottableFeature#addSnapshot}} after the Snapshot creation, Diff creation
and updating modification time, can invoke {{INodeFile#recordModification}} for each of the
open files. This way, the Snapshot just taken will have a {{FileDiff}} with {{fileSize}} captured
for each of the open files. 
> 4. Above model follows the current Snapshot and Diff protocols and doesn't introduce
any any disk formats. So, I don't think we will be needing any new FSImage Loader/Saver changes
for Snapshots.
> 5. One of the design goals of HDFS Snapshot was ability to take any number of snapshots
in O(1) time. LeaseManager though has all the open files with leases in-memory map, an iteration
is still needed to prune the needed open files and then run recordModification on each of
them. So, it will not be a strict O(1) with the above proposal. But, its going be a marginal
increase only as the new order will be of O(open_files_under_snap_dir). In order to avoid
HDFS Snapshots change in behavior for open files and avoid change in time complexity, this
improvement can be made under a new config {{"dfs.namenode.snapshot.freeze.openfiles"}} which
by default can be {{false}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message