Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9A526200CA4 for ; Wed, 7 Jun 2017 20:38:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 98F4A160BE5; Wed, 7 Jun 2017 18:38:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DF65E160BBF for ; Wed, 7 Jun 2017 20:38:22 +0200 (CEST) Received: (qmail 86547 invoked by uid 500); 7 Jun 2017 18:38:21 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 86451 invoked by uid 99); 7 Jun 2017 18:38:21 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Jun 2017 18:38:21 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 1BE7DC69D2 for ; Wed, 7 Jun 2017 18:38:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.211 X-Spam-Level: X-Spam-Status: No, score=-99.211 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id EpPUS3HBOPfh for ; Wed, 7 Jun 2017 18:38:20 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 169E15FB8B for ; Wed, 7 Jun 2017 18:38:20 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 578DCE0DDD for ; Wed, 7 Jun 2017 18:38:19 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 81BF121E1A for ; Wed, 7 Jun 2017 18:38:18 +0000 (UTC) Date: Wed, 7 Jun 2017 18:38:18 +0000 (UTC) From: "Wei-Chiu Chuang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HDFS-11711) DN should not delete the block On "Too many open files" Exception MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 07 Jun 2017 18:38:23 -0000 [ https://issues.apache.org/jira/browse/HDFS-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16041353#comment-16041353 ] Wei-Chiu Chuang edited comment on HDFS-11711 at 6/7/17 6:37 PM: ---------------------------------------------------------------- [~brahmareddy] sorry i didn't make myself clear. To begin with, this behavior was caused by HDFS-8492, which throws FileNotFoundException("BlockId " + blockId + " is not valid."). I was just thinking that "Too many open files" error is thrown within Java library, so there's no guarantee this would be compatible between different operating systems, or across different Java versions, or different JVM/JDK implementation. IMHO, the more compatible approach would be that we check if FNFE has "BlockId " + blockId + " is not valid.", and only delete the block when that's the case. Edit: HDFS-3100 throws FileNotFoundException("Meta-data not found for " + block) when meta file checksum is not found. So this should be checked as well. Or, it should just throw a new type of exception in these two cases. was (Author: jojochuang): [~brahmareddy] sorry i didn't make myself clear. To begin with, this behavior was caused by HDFS-8492, which throws FileNotFoundException("BlockId " + blockId + " is not valid."). I was just thinking that "Too many open files" error is thrown within Java library, so there's no guarantee this would be compatible between different operating systems, or across different Java versions, or different JVM/JDK implementation. IMHO, the more compatible approach would be that we check if FNFE has "BlockId " + blockId + " is not valid.", and only delete the block when that's the case. Edit: HDFS-3100 throws FileNotFoundException("Meta-data not found for " + block) when meta file checksum is not found. So this should be checked as well. > DN should not delete the block On "Too many open files" Exception > ----------------------------------------------------------------- > > Key: HDFS-11711 > URL: https://issues.apache.org/jira/browse/HDFS-11711 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Reporter: Brahma Reddy Battula > Assignee: Brahma Reddy Battula > Priority: Critical > Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2 > > Attachments: HDFS-11711-002.patch, HDFS-11711-003.patch, HDFS-11711-004.patch, HDFS-11711-branch-2-002.patch, HDFS-11711-branch-2-003.patch, HDFS-11711.patch > > > *Seen the following scenario in one of our customer environment* > * while jobclient writing {{"job.xml"}} there are pipeline failures and written to only one DN. > * when mapper reading the {{"job.xml"}}, DN got {{"Too many open files"}} (as system exceed limit) and block got deleted. Hence mapper failed to read and job got failed. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org