From hdfs-issues-return-207677-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org Fri Jan 12 22:59:06 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 682FC180621 for ; Fri, 12 Jan 2018 22:59:06 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 58977160C42; Fri, 12 Jan 2018 21:59:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9BD33160C20 for ; Fri, 12 Jan 2018 22:59:05 +0100 (CET) Received: (qmail 32780 invoked by uid 500); 12 Jan 2018 21:59:04 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 32769 invoked by uid 99); 12 Jan 2018 21:59:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Jan 2018 21:59:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 37E2F1A109F for ; Fri, 12 Jan 2018 21:59:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.911 X-Spam-Level: X-Spam-Status: No, score=-99.911 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id HE3kgZ8bFm95 for ; Fri, 12 Jan 2018 21:59:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id A98695FB4C for ; Fri, 12 Jan 2018 21:59:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id B705CE25FD for ; Fri, 12 Jan 2018 21:59:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id B860925BE7 for ; Fri, 12 Jan 2018 21:59:00 +0000 (UTC) Date: Fri, 12 Jan 2018 21:59:00 +0000 (UTC) From: "Ajay Kumar (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HDFS-12942) Synchronization issue in FSDataSetImpl#moveBlock MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-12942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16324607#comment-16324607 ] Ajay Kumar edited comment on HDFS-12942 at 1/12/18 9:58 PM: ------------------------------------------------------------ {quote}While cleaning up the new replica on failure, we call volume.onBlockFileDeletion which decrements the space used for the block on the new target volume. However I am not sure that the space used was incremented in the first place by copyReplicaToVolume. This may be a pre-existing bug actually.{quote} [~arpitagarwal],[~virajith] good catch. {code} try (AutoCloseableLock lock = datasetLock.acquire()) { // Increment numBlocks here as this block moved without knowing to BPS FsVolumeImpl volume = (FsVolumeImpl) newReplicaInfo.getVolume(); volume.incrNumBlocks(block.getBlockPoolId()); }{code} Seems in case of success we should increment the "dfs used" and "no of blocks" for new volume and decrement the same for old volume. Currently we only increment no of blocks. Is this bug or intentional? [~arpitagarwal], thanks for offline discussion on this. In case of failure we should free up memory on disk. (will update patch for same) {quote}Adding to this, in patch v4, the generation stamp check in finalizeReplica is done after a call to v.addFinalizedBlock which seems wasted work if the check doesn't pass. Can the generation stamp check be done before the call to v.addFinalizedBlock?{quote} Agree,Will move check in start of finalize call. was (Author: ajayydv): {quote}While cleaning up the new replica on failure, we call volume.onBlockFileDeletion which decrements the space used for the block on the new target volume. However I am not sure that the space used was incremented in the first place by copyReplicaToVolume. This may be a pre-existing bug actually.{quote} [~arpitagarwal],[~virajith] good catch. {code} try (AutoCloseableLock lock = datasetLock.acquire()) { // Increment numBlocks here as this block moved without knowing to BPS FsVolumeImpl volume = (FsVolumeImpl) newReplicaInfo.getVolume(); volume.incrNumBlocks(block.getBlockPoolId()); }{code} Seems in case of success we should increment the dfs used and no of blocks for new volume and decrement the same for old block. Currently we only increment no of blocks. Is this bug or intentional? [~arpitagarwal], thanks for offline discussion on this. In case of failure we should free up memory on disk. (will update patch for same) {quote}Adding to this, in patch v4, the generation stamp check in finalizeReplica is done after a call to v.addFinalizedBlock which seems wasted work if the check doesn't pass. Can the generation stamp check be done before the call to v.addFinalizedBlock?{quote} Agree,Will move check in start of finalize call. > Synchronization issue in FSDataSetImpl#moveBlock > ------------------------------------------------ > > Key: HDFS-12942 > URL: https://issues.apache.org/jira/browse/HDFS-12942 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Ajay Kumar > Assignee: Ajay Kumar > Attachments: HDFS-12942.001.patch, HDFS-12942.002.patch, HDFS-12942.003.patch, HDFS-12942.004.patch > > > FSDataSetImpl#moveBlock works in following following 3 steps: > # first creates a new replicaInfo object > # calls finalizeReplica to finalize it. > # Calls removeOldReplica to remove oldReplica. > A client can potentially append to the old replica between step 1 and 2. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org