Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 79AAB200C23 for ; Wed, 22 Feb 2017 20:19:52 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 78397160B62; Wed, 22 Feb 2017 19:19:52 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B87C3160B49 for ; Wed, 22 Feb 2017 20:19:51 +0100 (CET) Received: (qmail 650 invoked by uid 500); 22 Feb 2017 19:19:50 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 639 invoked by uid 99); 22 Feb 2017 19:19:50 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Feb 2017 19:19:50 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 6344E1A021E for ; Wed, 22 Feb 2017 19:19:50 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.199 X-Spam-Level: X-Spam-Status: No, score=-1.199 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id zqVD3K59fKxR for ; Wed, 22 Feb 2017 19:19:45 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 0D7E05F56B for ; Wed, 22 Feb 2017 19:19:45 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 7AA85E0819 for ; Wed, 22 Feb 2017 19:19:44 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 29FD724127 for ; Wed, 22 Feb 2017 19:19:44 +0000 (UTC) Date: Wed, 22 Feb 2017 19:19:44 +0000 (UTC) From: "Ravi Prakash (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-11402) HDFS Snapshots should capture point-in-time copies of OPEN files MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 22 Feb 2017 19:19:52 -0000 [ https://issues.apache.org/jira/browse/HDFS-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879013#comment-15879013 ] Ravi Prakash commented on HDFS-11402: ------------------------------------- I'm sorry I don't fully understand why we care about HDFS-11435. In fact it makes heartbeat handling a lot slower and more complicated. And for what? Slightly more updated length values in Snapshots (a relatively rare operation.) If the DNs send back lengths of RBW every heartbeat, you'll still have to choose the minimum length returned by at least {{dfs.namenode.replication.min}} nodes. IMHO we should store in the snapshot the last minimum length that the NN knows data has been written till. If that's the last block boundary, so be it. > HDFS Snapshots should capture point-in-time copies of OPEN files > ---------------------------------------------------------------- > > Key: HDFS-11402 > URL: https://issues.apache.org/jira/browse/HDFS-11402 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs > Affects Versions: 2.6.0 > Reporter: Manoj Govindassamy > Assignee: Manoj Govindassamy > Attachments: HDFS-11402.01.patch, HDFS-11402.02.patch > > > *Problem:* > 1. When there are files being written and when HDFS Snapshots are taken in parallel, Snapshots do capture all these files, but these being written files in Snapshots do not have the point-in-time file length captured. That is, these open files are not frozen in HDFS Snapshots. These open files grow/shrink in length, just like the original file, even after the snapshot time. > 2. At the time of File close or any other meta data modification operation on these files, HDFS reconciles the file length and records the modification in the last taken Snapshot. All the previously taken Snapshots continue to have those open Files with no modification recorded. So, all those previous snapshots end up using the final modification record in the last snapshot. Thus after the file close, file lengths in all those snapshots will end up same. > Assume File1 is opened for write and a total of 1MB written to it. While the writes are happening, snapshots are taken in parallel. > {noformat} > |---Time---T1-----------T2-------------T3----------------T4------> > |-----------------------Snap1----------Snap2-------------Snap3---> > |---File1.open---write---------write-----------close-------------> > {noformat} > Then at time, > T2: > Snap1.File1.length = 0 > T3: > Snap1.File1.length = 0 > Snap2.File1.length = 0 > > T4: > Snap1.File1.length = 1MB > Snap2.File1.length = 1MB > Snap3.File1.length = 1MB > *Proposal* > 1. At the time of taking Snapshot, {{SnapshotManager#createSnapshot}} can optionally request {{DirectorySnapshottableFeature#addSnapshot}} to freeze open files. > 2. {{DirectorySnapshottableFeature#addSnapshot}} can consult with {{LeaseManager}} and get a list INodesInPath for all open files under the snapshot dir. > 3. {{DirectorySnapshottableFeature#addSnapshot}} after the Snapshot creation, Diff creation and updating modification time, can invoke {{INodeFile#recordModification}} for each of the open files. This way, the Snapshot just taken will have a {{FileDiff}} with {{fileSize}} captured for each of the open files. > 4. Above model follows the current Snapshot and Diff protocols and doesn't introduce any any disk formats. So, I don't think we will be needing any new FSImage Loader/Saver changes for Snapshots. > 5. One of the design goals of HDFS Snapshot was ability to take any number of snapshots in O(1) time. LeaseManager though has all the open files with leases in-memory map, an iteration is still needed to prune the needed open files and then run recordModification on each of them. So, it will not be a strict O(1) with the above proposal. But, its going be a marginal increase only as the new order will be of O(open_files_under_snap_dir). In order to avoid HDFS Snapshots change in behavior for open files and avoid change in time complexity, this improvement can be made under a new config {{"dfs.namenode.snapshot.freeze.openfiles"}} which by default can be {{false}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org