hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: HBase recovery
Date Fri, 13 Aug 2010 22:26:15 GMT
Ok, so if not all the data came back then it could be a bug, although
it could have already been fixed since we iterate very fast on the
0.89 releases (which are dev preview releases, not meant for
production).

When a region server crashes, the master splits all the write-ahead
logs and the regions are then distributed to the remaining region
servers. It's all automatic. Even if it happened during a major
compaction, the original store files aren't deleted until the new
store file is created.

Did the master encounter any fatal exceptions while splitting the
logs? Did you take a look at the log file? Can you figure which rows
in .META. are missing (there would be holes)?

J-D

On Fri, Aug 13, 2010 at 3:18 PM, Jeremy Carroll
<jeremy.carroll@networkedinsights.com> wrote:
> We are using CDH3 Beta 2.
> ________________________________________
> From: jdcryans@gmail.com [jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans [jdcryans@apache.org]
> Sent: Friday, August 13, 2010 4:50 PM
> To: user@hbase.apache.org
> Subject: Re: HBase recovery
>
> Which version? Prior to HBase 0.89 + Hadoop 0.20-append (or cdh3),
> HBase cannot guarantee durability of the latest inserts (this includes
> edits to .META.)
>
> J-D
>
> On Fri, Aug 13, 2010 at 2:45 PM, Jeremy Carroll
> <jeremy.carroll@networkedinsights.com> wrote:
>> During some testing of a small development cluster, one of the RegionServers that
we employ has an issue with a bad RAM stick. So when it gets into heavy RAM operation it likes
to crash. Here is my question. We had an issue where the RegionServer holding .META. crashed.
The entire cluster was unusable as it did not reassign .META. to a different region. Also
when the server goes down, what happens to all the regions that it held? Does it reassign
them to other region servers? Also what is the correct action for recovery. It crashed during
a major_compaction so how do I verify that I am not missing data? I see that I had 166 regions
online on this server before the crash, and now after the crash it has 158. What's the correct
steps to recover HBase after a major crash?
>

Mime
View raw message