Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0A953106CF for ; Fri, 10 Apr 2015 02:48:18 +0000 (UTC) Received: (qmail 67538 invoked by uid 500); 10 Apr 2015 02:48:12 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 67488 invoked by uid 500); 10 Apr 2015 02:48:12 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 67474 invoked by uid 99); 10 Apr 2015 02:48:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Apr 2015 02:48:12 +0000 Date: Fri, 10 Apr 2015 02:48:12 +0000 (UTC) From: "Sean Busbey (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HBASE-13396) Cleanup unclosed writers in later writer rolling MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-13396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488795#comment-14488795 ] Sean Busbey edited comment on HBASE-13396 at 4/10/15 2:47 AM: -------------------------------------------------------------- {code} + } catch (IOException e) { + if (e instanceof FileNotFoundException) { + unclosedWriters.remove(writer); + } else { + LOG.error("Cleanup unclosed writer failed.", e); + unclosedWriters.put(writer, EnvironmentEdgeManager.currentTime()); + } + } {code} Please add a comment that explains the reasoning behind this special handling. Why can the cleanup thread move the file if we still have it open with neither losing the lease? is it because we're in the same DFSClient instance? Should we change that? was (Author: busbey): {quote} + } catch (IOException e) { + if (e instanceof FileNotFoundException) { + unclosedWriters.remove(writer); + } else { + LOG.error("Cleanup unclosed writer failed.", e); + unclosedWriters.put(writer, EnvironmentEdgeManager.currentTime()); + } + } {quote} Please add a comment that explains the reasoning behind this special handling. Why can the cleanup thread move the file if we still have it open with neither losing the lease? is it because we're in the same DFSClient instance? Should we change that? > Cleanup unclosed writers in later writer rolling > ------------------------------------------------ > > Key: HBASE-13396 > URL: https://issues.apache.org/jira/browse/HBASE-13396 > Project: HBase > Issue Type: Bug > Reporter: Liu Shaohui > Assignee: Liu Shaohui > Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-13396-v1.diff, HBASE-13396-v2.diff, HBASE-13396-v3.diff > > > Currently, the default value of hbase.regionserver.logroll.errors.tolerated is 2, which means regionserver can tolerate two continuous failures of closing writers at most. Temporary problems of network or namenode may cause those failures. After those failures, the hdfs clients in RS may continue to renew the lease of the hlog of the writer and the namenode will not help to recover the lease of this hlog. So the last block of this hlog will be RBW(replica being written) state until the regionserver is down. Blocks in this state will block the datanode decommission and other operations in HDFS. > So I think we need a mechanism to clean up those unclosed writers afterwards. A simple solution is to record those unclosed writers and attempt to close these writers until success. > Discussions and suggestions are welcomed~ Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)