Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 51255 invoked from network); 24 Sep 2009 21:41:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Sep 2009 21:41:06 -0000 Received: (qmail 1676 invoked by uid 500); 24 Sep 2009 21:41:05 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 1648 invoked by uid 500); 24 Sep 2009 21:41:05 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 1638 invoked by uid 99); 24 Sep 2009 21:41:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Sep 2009 21:41:05 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of elsif.then@gmail.com designates 209.85.218.214 as permitted sender) Received: from [209.85.218.214] (HELO mail-bw0-f214.google.com) (209.85.218.214) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Sep 2009 21:40:55 +0000 Received: by bwz10 with SMTP id 10so1691291bwz.29 for ; Thu, 24 Sep 2009 14:40:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=kaKf+Gy7UzeE8cLYqxmeWWK2+fpYyx/Yinsp3BiN6Cg=; b=XA2hLDT5DfO9YZCvi3hESUuD8cLhFxo1Lm8+86vuXkpVUDXChpJpxdSCFhSEDo4QFJ Hn3uMhlKfemYQUCa0QREPbYXCHVP6kaC29Es7L9VLQmKQLaEUGunSgSsW1mZy19/noQG ngqwlvsjRE6I2U6+rm6DrSF1uwsvL2HlYY9HI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; b=tzB+/R1FwueUKgvI8GC7oPU60U+69hwGml9M1X2Nn8bUREz9Tjdq+sCqigZH0uO7uo kGjX0pqw3gofWgwZEf98l31mxPI66u7ZaUXeIsTWSimre7ePQgpSmO4DMBvm6xywe493 qquQudFdICH/Y2gMTdferBdikSAo1jJcw35bw= Received: by 10.204.153.220 with SMTP id l28mr3499393bkw.86.1253828433812; Thu, 24 Sep 2009 14:40:33 -0700 (PDT) Received: from ?127.0.0.1? (tor-exit.aof.su [216.224.124.124]) by mx.google.com with ESMTPS id 2sm3414088fks.33.2009.09.24.14.40.22 (version=SSLv3 cipher=RC4-MD5); Thu, 24 Sep 2009 14:40:33 -0700 (PDT) Message-ID: <4ABBE78D.4040402@gmail.com> Date: Thu, 24 Sep 2009 14:41:33 -0700 From: elsif User-Agent: Thunderbird 2.0.0.23 (X11/20090812) MIME-Version: 1.0 To: hbase-user@hadoop.apache.org Subject: Re: Table recovery options References: <4ABAAC23.5020700@gmail.com> <7c962aed0909241318o3f565728r7d243fff0fe84250@mail.gmail.com> In-Reply-To: <7c962aed0909241318o3f565728r7d243fff0fe84250@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Please see comments inline. stack wrote: > On Wed, Sep 23, 2009 at 4:15 PM, elsif wrote: > > >> We have a couple clusters running with lzo compression. When testing >> the new 0.20.1 release >> > > > You mean hadoop's new 0.20.1 release? > > This is with the hadoop 0.20.1 release and the hbase 0.20 branch which results in a hbase-0.20.1-dev.jar. >> I setup a single node cluster and reused the >> compression jar and native libraries from the 0.20.0 release. >> > > > > hadoop 0.20.0 release? > > What release of hbase are you using? > > > > >> The >> following session log shows a table being created with the lzo option >> and some rows being added. After hbase is restarted the table is no >> longer accessible - the region server crashed during the flush operation >> due to a SIGFPE. >> >> The flush that was done during shutdown? The flush of the .META.? If this >> > failed, then state of .META. would not have been persisted and yes, you > would have lost your table. > > > > >> Would it be possible to add a check to verify the compression feature >> before it is used in a table to avoid corruption? A simple shell or cli >> option would be great. >> >> > > Sounds like a good idea. What would you suggest? You could force a flush > on a table with data on it and check if it worked or not? > > > > A flush would still cause data loss in this scenario as the region server crashes from the library mismatch. A standalone cli check that could be run on each region server node after an install or upgrade but before staring any of the hbase daemons would be better - that way no data is in jeopardy. I will code something up and submit it back to the list. >> In general, once hbase tables are corrupted is there anyway to repair >> them? - In this test case the table is never written to disk. >> >> >> > Depends on the 'corruption'. Generally yes, there are ways. Below it seems > a compression library mismatch is preventing hbase writing the filesystem. > Can you fix this and retry? > > > Fixing the compression library allows new tables to work cleanly. The original table remains corrupted which is understandable. > >> Is it possible to regenerate an hbase table from the data files stored >> in hdfs? >> >> > > > Yes. You'd have to write a script. In hdfs, under each region directory > there is a file named .regioninfo. It has the content of the .META. table > for this region serialized. A script could look at some subset of the > regions on disk -- say all that make up a table and do fix up of .META. On > the next scan of .META. the table should be onlined. Let me know if you'd > like some help with this and we can work on it together. > > > > That would be great. Do you have any samples or psudo-code for the operation? Is there any documentation on the specific file contents? >> Are there any preventative measures we can take to make it easier to >> roll back to a valid state? >> >> > > You can backup hbase content if its small, or you can scan content from > before the date at which invalid data shows. What else would you like? > Is there any benefit in storing snapshots of the .regioninfo. file? Guessing the table would have to be disabled during the copy? It would be nice if there was a way to verify the health of a table and report on any inconsistencies. > St.Ack > >