Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6AAA59E64 for ; Sun, 1 Jul 2012 03:44:06 +0000 (UTC) Received: (qmail 2790 invoked by uid 500); 1 Jul 2012 03:44:04 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 2721 invoked by uid 500); 1 Jul 2012 03:44:03 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 2639 invoked by uid 99); 1 Jul 2012 03:44:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Jul 2012 03:44:00 +0000 X-ASF-Spam-Status: No, hits=1.0 required=5.0 tests=FSL_FREEMAIL_1,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [98.138.91.40] (HELO nm11-vm1.bullet.mail.ne1.yahoo.com) (98.138.91.40) by apache.org (qpsmtpd/0.29) with SMTP; Sun, 01 Jul 2012 03:43:52 +0000 Received: from [98.138.90.53] by nm11.bullet.mail.ne1.yahoo.com with NNFMP; 01 Jul 2012 03:43:31 -0000 Received: from [98.138.89.195] by tm6.bullet.mail.ne1.yahoo.com with NNFMP; 01 Jul 2012 03:43:31 -0000 Received: from [127.0.0.1] by omp1053.mail.ne1.yahoo.com with NNFMP; 01 Jul 2012 03:43:31 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 691205.94544.bm@omp1053.mail.ne1.yahoo.com Received: (qmail 53311 invoked by uid 60001); 1 Jul 2012 03:43:30 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1341114210; bh=/Ar7fbYGjOfaD17c4IkjdhBgeeWcf8hZPDnxfLg877I=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=3BZ3RU51zd9vK+H2pdDa50kVKctSFxZkSyFC+r2GRZUbahb4R9Pz2ji4NQvJLblFMfnauZICkHZM92DK+fHcCsQ2B8JDZPogCDrpE5zf0tSLHJaRbXjk987w+4kHt8qHzExXQGe01/91WknKSH51SYqEO7LIGL4MlatfuV5BQOw= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=J/5c4RV0b8rg0dK+HFg/Op6iCyfTlf42SlOY4A8yqiO5GHE7rESIiJes0uINjBBhqH7I8J1LZscj+x5FYoKCgRRr9f16YP6Os9pNTq6SHdPaF65iKLDGdnY5qaiChhiUIhmw4Em/AhS5RldZ24BO0Ctb2vbvlBKaGzrPMcqq1hU=; X-YMail-OSG: QVAJYy0VM1meRz2qpC_O9jL71mhA4c9RUf3sGc4DTLtVArW gJL_79wNiSepQOz2Uhm8p3CcEdmP.mfWfVeP0DaIbrAuD9n_abUO33pKY3fy uAJYjKT7nR5YQeGDGZLTLMA_3b.2M9s3EepAuUuU6XlxZm2XJpTHJLKu7P72 wH72.ZH05tXU_S0zlMvy1Q9gP23eSdVXlKoYjrFeKnXDWvvZW2Hj8mjqtHUy FhaKu8xNbj5DacCJkjA0bHy.GQvx5t9q5jwcG0tC_daxDjs7j67XFjqh5743 OoxTKsHZssm0nHXals2XumMLeeTNbCFrr6fH8JljsOyTnzxHH57y5ATmrGKs fangBXL9tnKzLCZm9hkxWTScMmZn4jFLnoPeyBkAzMVok.PO2CE7mREvbT7n JnRAy.Ix6o8y_tlOK_hGtFmZOEg5500Zmmh5yGqigB3E4nbgFkwBISbpGpbz iAlh7Fr0mBlDLNRd8CGIvQ5lTHw-- Received: from [24.4.207.85] by web121704.mail.ne1.yahoo.com via HTTP; Sat, 30 Jun 2012 20:43:30 PDT X-Mailer: YahooMailWebService/0.8.118.349524 References: Message-ID: <1341114210.32180.YahooMailNeo@web121704.mail.ne1.yahoo.com> Date: Sat, 30 Jun 2012 20:43:30 -0700 (PDT) From: lars hofhansl Reply-To: lars hofhansl Subject: Re: Recovering corrupt HLog files To: "user@hbase.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable I added it in HBase 0.94, but the code is pretty isolated and could be easi= ly (I think) ported to HBase 0.90.=0ACould potentially be just turned into = a separate jar file.=0A=0A=0AAs for whether it'll do the write thing... It = uses the timestamps provided solely to pick the right set of HLog instead o= f playing them all.=0ABut since all operations (except Increment/Append) ar= e idempotent with timestamp, playing them again has no effect (i.e. your ne= wer versions will still be visible, since they have a newer timestamp).=0A= =0AAs these files are corrupt there's no way way of knowing how far WALPlay= er will get in writing them. It will at least play until the first corrupti= on identified.=0A=0A=0A-- Lars=0A=0A=0A=0A----- Original Message -----=0AFr= om: Bryan Beaudreault =0ATo: user@hbase.apache.or= g=0ACc: =0ASent: Saturday, June 30, 2012 12:29 PM=0ASubject: Re: Recovering= corrupt HLog files=0A=0AI should have mentioned in my initial email that I= am operating on HBase=0A0.90.4.=A0 Is WALPlayer available in this version?= =A0 I am having trouble=0Afinding it or anything similar.=0A=0AOn Sat, Jun = 30, 2012 at 1:14 PM, Li Pi wrote:=0A=0A> WALPlayer will look a= t the timestamp. Replaying an older edit that has=0A> since been overwritte= n shouldn't change anything.=0A>=0A> On Sat, Jun 30, 2012 at 9:49 AM, Bryan= Beaudreault <=0A> bbeaudreault@hubspot.com=0A> > wrote:=0A>=0A> > They are= all pretty large, around 40+mb.=A0 Will the walplayer be smart=0A> > enoug= h to only write edits that still look relevant (i.e. based on=0A> > timesta= mps of the edits vs timestamps of the versions in hbase)?=A0 Writes=0A> > h= ave been coming in since we recovered.=0A> >=0A> > On Sat, Jun 30, 2012 at = 11:05 AM, Stack wrote:=0A> >=0A> > > On Sat, Jun 30, 201= 2 at 8:38 AM, Bryan Beaudreault=0A> > > wrote:= =0A> > > > 12/06/30 00:00:48 INFO wal.HLogSplitter: Got while parsing hlog= =0A> > > >=0A> > >=0A> >=0A> hdfs://my-namenode-ip-addr:8020/hbase/.logs/my= -rs-ip-addr,60020,1338667719591/my-rs-ip-addr%3A60020.1340935453874.=0A> > = > > Marking as corrupted=0A> > > >=0A> > >=0A> > > What size do these logs = have?=0A> > >=0A> > > > We are back to stable operating now, and in trying = to research this I=0A> > > found=0A> > > > the hdfs://my-namenode-ip-addr:8= 020/hbase/.corrupt directory.=A0 There=0A> > are=0A> > > 20=0A> > > > files= listed there.=0A> > > >=0A> > >=0A> > > Ditto.=0A> > >=0A> > > > What are = our options for tracking down and potentially recovering any=0A> > > data= =0A> > > > that was lost.=A0 Or how can we even tell what was lost, if any?= =A0 Does=0A> > the=0A> > > > existence of these files pretty much guarantee= data lost? There=0A> doesn't=0A> > > > seem to be much documentation on th= is.=A0 From reading it seems like it=0A> > > might=0A> > > > be possible th= at part of each of these files was recovered.=0A> > > >=0A> > >=0A> > > If = size > 0, could try walplaying them:=0A> > > http://hbase.apache.org/book.h= tml#walplayer=0A> > >=0A> > > St.Ack=0A> > >=0A> >=0A>=0A