Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 52236 invoked from network); 1 Mar 2009 00:56:36 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Mar 2009 00:56:36 -0000 Received: (qmail 68031 invoked by uid 500); 1 Mar 2009 00:56:35 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 68006 invoked by uid 500); 1 Mar 2009 00:56:35 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 67995 invoked by uid 99); 1 Mar 2009 00:56:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 Feb 2009 16:56:35 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Mar 2009 00:56:33 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id C1CDD234C4AC for ; Sat, 28 Feb 2009 16:56:12 -0800 (PST) Message-ID: <963768428.1235868972792.JavaMail.jira@brutus> Date: Sat, 28 Feb 2009 16:56:12 -0800 (PST) From: "Evgeny Ryabitskiy (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Updated: (HBASE-1084) Reinitializable DFS client In-Reply-To: <277646460.1229972564320.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Evgeny Ryabitskiy updated HBASE-1084: ------------------------------------- Attachment: HBASE-1084_HRegionServer.java.patch Change in protected boolean checkFileSystem() to try reinitialize DFS first and if fails ShutDown. > Reinitializable DFS client > -------------------------- > > Key: HBASE-1084 > URL: https://issues.apache.org/jira/browse/HBASE-1084 > Project: Hadoop HBase > Issue Type: Improvement > Components: io, master, regionserver > Reporter: Andrew Purtell > Assignee: Evgeny Ryabitskiy > Fix For: 0.20.0 > > Attachments: HBASE-1084_HRegionServer.java.patch > > > HBase is the only long lived DFS client. Tasks handle DFS errors by dying. HBase daemons do not and instead depend on dfsclient error recovery capability, but that is not sufficiently developed or tested. Several issues are a result: > * HBASE-846: hbase looses its mind when hdfs fills > * HBASE-879: When dfs restarts or moves blocks around, hbase regionservers don't notice > * HBASE-932: Regionserver restart > * HBASE-1078: "java.io.IOException: Could not obtain block": allthough file is there and accessible through the dfs client > * hlog indefinitely hung on getting new blocks from dfs on apurtell cluster > * regions closed due to transient DFS problems during loaded cluster restart > These issues might also be related: > * HBASE-15: Could not complete hdfs write out to flush file forcing regionserver restart > * HBASE-667: Hung regionserver; hung on hdfs: writeChunk, DFSClient.java:2126, DataStreamer socketWrite > HBase should reinitialize the fs a few times upon catching fs exceptions, with backoff, to compensate. This can be done by making a wrapper around all fs operations that releases references to the old fs instance and makes and initializes a new instance to retry. All fs users would need to be fixed up to handle loss of state around fs wrapper invocations: hlog, memcache flusher, hstore, etc. > Cases of clear unrecoverable failure (are there any?) should be excepted. > Once the fs wrapper is in place, error recovery scenarios can be tested by forcing reinitialization of the fs during PE or other test cases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.