Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B4F3FE748 for ; Tue, 8 Jan 2013 05:30:18 +0000 (UTC) Received: (qmail 31837 invoked by uid 500); 8 Jan 2013 05:30:18 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 31660 invoked by uid 500); 8 Jan 2013 05:30:18 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 31591 invoked by uid 99); 8 Jan 2013 05:30:18 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Jan 2013 05:30:18 +0000 Date: Tue, 8 Jan 2013 05:30:17 +0000 (UTC) From: "chunhui shen (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-7507) Make memstore flush be able to retry after exception MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546622#comment-13546622 ] chunhui shen commented on HBASE-7507: ------------------------------------- bq.Why moving the location of validateStoreFile() call ? We can't do the retry in HStore#commitFile, but we could do the retry if failed validateStoreFile(), so move its location [~stack] bq.Should we open a new issue to retry all hdfs operations? We will do the hdfs operations for HFile and HLog, and we could tolerate IO errors in HLog now. So I think retry for flush is enough since IO errors in compaction are nothing matter For other comments, I will address in new patch Thanks > Make memstore flush be able to retry after exception > ---------------------------------------------------- > > Key: HBASE-7507 > URL: https://issues.apache.org/jira/browse/HBASE-7507 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.3 > Reporter: chunhui shen > Assignee: chunhui shen > Fix For: 0.96.0 > > Attachments: 7507-trunk v1.patch > > > We will abort regionserver if memstore flush throws exception. > I thinks we could do retry to make regionserver more stable because file system may be not ok in a transient time. e.g. Switching namenode in the NamenodeHA environment > {code} > HRegion#internalFlushcache(){ > ... > try { > ... > }catch(Throwable t){ > DroppedSnapshotException dse = new DroppedSnapshotException("region: " + > Bytes.toStringBinary(getRegionName())); > dse.initCause(t); > throw dse; > } > ... > } > MemStoreFlusher#flushRegion(){ > ... > region.flushcache(); > ... > try { > }catch(DroppedSnapshotException ex){ > server.abort("Replay of HLog required. Forcing server shutdown", ex); > } > ... > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira