From hdfs-issues-return-264283-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org Mon May 27 08:30:05 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 504BC18072F for ; Mon, 27 May 2019 10:30:05 +0200 (CEST) Received: (qmail 62437 invoked by uid 500); 27 May 2019 08:30:02 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 62317 invoked by uid 99); 27 May 2019 08:30:02 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 May 2019 08:30:02 +0000 Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 9839BE2B4B for ; Mon, 27 May 2019 08:30:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id A579725818 for ; Mon, 27 May 2019 08:30:00 +0000 (UTC) Date: Mon, 27 May 2019 08:30:00 +0000 (UTC) From: "Siyao Meng (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HDFS-14514) Actual read size of open file in encryption zone still larger than listing size even after enabling HDFS-11402 in Hadoop 2 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Siyao Meng created HDFS-14514: --------------------------------- Summary: Actual read size of open file in encryption zone still larger than listing size even after enabling HDFS-11402 in Hadoop 2 Key: HDFS-14514 URL: https://issues.apache.org/jira/browse/HDFS-14514 Project: Hadoop HDFS Issue Type: Bug Components: hdfs, snapshots Affects Versions: 2.7.7, 2.8.5, 2.9.2, 2.6.5 Reporter: Siyao Meng Assignee: Siyao Meng In Hadoop 2, when a file is opened for write in *encryption zone*, taken a snapshot and appended, the read out file size in the snapshot is larger than the listing size. This happens even when immutable snapshot HDFS-11402 is enabled. Note: The refactor HDFS-8905 happened in Hadoop 3.0 and later fixed the bug silently (probably incidentally). Hadoop 2.x are still suffering from this issue. Thanks [~sodonnell] for locating the root cause in the codebase. Repro: 1. Set dfs.namenode.snapshot.capture.openfiles to true in hdfs-site.xml, start HDFS cluster 2. Create an empty directory /dataenc, create encryption zone and allow snapshot on it {code:bash} hadoop key create reprokey sudo -u hdfs hdfs dfs -mkdir /dataenc sudo -u hdfs hdfs crypto -createZone -keyName reprokey -path /dataenc sudo -u hdfs hdfs dfsadmin -allowSnapshot /dataenc {code} 3. Use a client that keeps a file open for write under /dataenc. For example, I'm using Flume HDFS sink to tail a local file. 4. Append the file several times using the client, keep the file open. 5. Create a snapshot {code:bash} sudo -u hdfs hdfs dfs -createSnapshot /dataenc snap1 {code} 6. Append the file one or more times, but don't let the file size exceed the block size limit. Wait for several seconds for the append to be flushed to DN. 7. Do a -ls on the file inside the snapshot, then try to read the file using -get, you should see the actual file size read is larger than the listing size from -ls. The patch and an updated unit test will be uploaded later. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org