Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 19395 invoked from network); 27 Mar 2009 21:32:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 27 Mar 2009 21:32:11 -0000 Received: (qmail 43993 invoked by uid 500); 27 Mar 2009 21:32:11 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 43932 invoked by uid 500); 27 Mar 2009 21:32:11 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 43917 invoked by uid 99); 27 Mar 2009 21:32:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Mar 2009 21:32:11 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Mar 2009 21:32:10 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 9D97F234C046 for ; Fri, 27 Mar 2009 14:31:50 -0700 (PDT) Message-ID: <1663287264.1238189510644.JavaMail.jira@brutus> Date: Fri, 27 Mar 2009 14:31:50 -0700 (PDT) From: "Jim Kellerman (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Commented: (HBASE-7) [hbase] Provide a HBase checker and repair tool similar to fsck MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12690114#action_12690114 ] Jim Kellerman commented on HBASE-7: ----------------------------------- There are (at least) three areas where we are still vulnerable: 1. Incomplete table deletion. (see above) 2. Incomplete cache flush (region server dies during flush) see below. 3. Inability to recover write ahead log (HLog) if server dies. Depends on HADOOP-4379 HBase protects itself from incomplete compactions by performing the operation in a temporary directory. If the compaction does not complete successfully, another compaction request will be generated and the partially completed compaction data is erased. We should do something similar for a cache flush: write the flush to a temporary directory and move the new store file into place only if the flush completes successfully. Any subsequent cache flush will erase data in the temporary flush directory. Recovery will happen when HLog is replayed by new server for the region. Without HADOOP-4379, we cannot guarantee that we can recover the most recent HLog file. Although Dhruba is looking at the issue, he would probably accept help from someone else. Getting HADOOP-4379 integrated into Hadoop is the most important thing we can do to ensure data integrity. The second most important thing to do is to put cache flushes into a temporary directory. That would leave hbasefsck handling incomplete deletes (and perhaps other inconsistencies in the HBase file structure) > [hbase] Provide a HBase checker and repair tool similar to fsck > --------------------------------------------------------------- > > Key: HBASE-7 > URL: https://issues.apache.org/jira/browse/HBASE-7 > Project: Hadoop HBase > Issue Type: New Feature > Components: util > Reporter: Jim Kellerman > Fix For: 0.20.0 > > Attachments: patch.txt > > > We need a tool to verify (and repair) HBase much like fsck -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.