Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of jonathan.bender@gmail.com
 designates 209.85.160.41 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:cc
         :content-type;
        b=D00VXzjJ1piXPgCEvVDTfX/IRRvVhNZzaLnD5hPLfhZYeu3RVRKBTCWIvCY4f/7k9I
         13WTH3iyebVJN1uuvlsnrieMi6Exv/n257QTgDJCF1bSZWN9vzbT3FVUpyrVBuMi8nQS
         JhHEutEzz+DQc34oJ05xdVKObNI8GM1tvfeVc=
MIME-Version: 1.0
In-Reply-To: <BANLkTimq4YnYkVv46yWqaFRXya8WJ40dTQ@mail.gmail.com>
References: <BANLkTimauvAesJEMZeqGJ8aO7CgrdEpZSg@mail.gmail.com>
 <BANLkTim=DAqaRXSKYUm=OtJiRun4zRkyrw@mail.gmail.com>
 <BANLkTimrACY4EEqM89P-D0Axmgi+r0KNLA@mail.gmail.com>
 <BANLkTimq4YnYkVv46yWqaFRXya8WJ40dTQ@mail.gmail.com>
From: Jonathan Bender <jonathan.bender@gmail.com>
Date: Wed, 27 Apr 2011 08:28:41 -0700
Message-ID: <BANLkTimQDErtrn=G7SO2tpkvCQd5UHgHzg@mail.gmail.com>
Subject: Re: HDFS reports corrupted blocks after HBase reinstall
Cc: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=bcaec53149f79dadab04a1e819c4

--bcaec53149f79dadab04a1e819c4
Content-Type: text/plain; charset=ISO-8859-1

So it's definitely a case of HDFS not being able to recover the image.
 Maybe this is better directed toward another list, but has anyone had
issues with this, or any suggestions for trying to eradicate this?


2011-04-26 17:15:56,898 INFO org.apache.hadoop.hdfs.server.common.Storage:
Recovering storage directory /var/lib/hadoop-0.20/cache/hadoop/dfs/name from
failed checkpoint.
2011-04-26 17:15:56,905 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 204
2011-04-26 17:15:57,020 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 0
2011-04-26 17:15:57,021 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 26833 loaded in 0 seconds.
2011-04-26 17:15:57,257 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Invalid opcode, reached
end of edit log Number of transactions found 528
2011-04-26 17:15:57,258 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current/edits of size
1049092 edits # 528 loaded in 0 seconds.
2011-04-26 17:15:57,265 ERROR org.apache.hadoop.hdfs.server.common.Storage:
Unable to save image for /var/lib/hadoop-0.20/cache/hadoop/dfs/name
java.io.IOException: saveLeases found path /hbase/base_tmp/.logs/
sv004.my.domain.com,60020,1302882411768/sv004.my.domain.com%3A60020.1302882412951
but no matching entry in namespace.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:5153)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1071)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1170)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:1118)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:347)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:321)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:267)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:461)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1202)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1211)
2011-04-26 17:15:57,273 WARN org.apache.hadoop.hdfs.server.common.Storage:
FSImage:processIOError: removing storage:
/var/lib/hadoop-0.20/cache/hadoop/dfs/name
2011-04-26 17:15:57,274 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
FSImage in 1553 msecs


On Tue, Apr 26, 2011 at 5:19 PM, Jonathan Bender
<jonathan.bender@gmail.com>wrote:

> Wow, this is more intense than I thought...as soon as I load HBase again,
> my HDFS filesystem reverts back to an older snapshot essentially.  As in, I
> don't see any of the changes I had made since that time, in the hbase table
> or otherwise.
>
> I'm using CDH3 beta 4, which I believe stores its local hbase data in a
> different directory--not entirely sure where though.
>
> I'm not entirely sure what happened to mess this up, but it seems pretty
> serious.
>
> On Tue, Apr 26, 2011 at 5:07 PM, Himanshu Vashishtha <
> hvashish@cs.ualberta.ca> wrote:
>
>> Could it be the /tmp/hbase-<userID> directory that is playing the culprit.
>> just a wild guess though.
>>
>>
>> On Tue, Apr 26, 2011 at 5:56 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:
>>
>>> Unless HBase was running when you wiped that out (and even then), I
>>> don't see how this could happen. Could you match those blocks to the
>>> files using fsck and figure when the files were created and if they
>>> were part of the old install?
>>>
>>> Thx,
>>>
>>> J-D
>>>
>>> On Tue, Apr 26, 2011 at 4:53 PM, Jonathan Bender
>>> <jonathan.bender@gmail.com> wrote:
>>> > Hi all, I'm having a strange error which I can't exactly figure out.
>>> >
>>> > After wiping my /hbase HDFS directory to do a fresh install, I am
>>> getting
>>> > "MISSING BLOCKS" in this /hbase directory, which cause HDFS to start up
>>> in
>>> > safe mode.  This doesn't happen until I start my region servers, so I
>>> have a
>>> > feeling there is some kind of corrupted metadata that is being loaded
>>> from
>>> > these region servers.
>>> >
>>> > Is there a graceful way to wipe the HBase directory clean?  Any local
>>> > directories on the region servers /master / ZK server that I should be
>>> > wiping as well?
>>> >
>>> > Cheers,
>>> > Jon
>>> >
>>>
>>
>>
>

--bcaec53149f79dadab04a1e819c4--