hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Buckley,Ron" <buckl...@oclc.org>
Subject RE: Recovering hbase after a failure
Date Thu, 02 Oct 2014 18:00:06 GMT

Thanks. No WAL replay errors. Just about all the region servers logged a DroppedSnapshotException
and then aborted. I think we're good as far as that goes.


-----Original Message-----
From: Esteban Gutierrez [mailto:esteban@cloudera.com] 
Sent: Thursday, October 02, 2014 1:26 PM
To: user@hbase.apache.org
Subject: Re: Recovering hbase after a failure

Hi Ron,

Look into dropped snapshot exceptions in the logs and puts or deletes that skip the WAL. If
everything is good there then clients should have handled the unavailability of HBase and
there should not be any dataloss from the server side. Also double check if after the crash
there were not errors replaying the WAL.


Cloudera, Inc.

On Thu, Oct 2, 2014 at 10:18 AM, Buckley,Ron <buckleyr@oclc.org> wrote:

> We just had an event where, on our main hbase instance, the /hbase 
> directory got moved out from under the running system (Human error).
> HBase was really unhappy about that, but we were able to recover it 
> fairly easily and get back going.
> As far as I can tell, all the data and tables came back correct. But, 
> I'm pretty concerned that there may be some hidden corruption or data loss.
> 'hbase hbck'  runs clean and there are no new complaints in the logs.
> Can anyone think of anything else we should look at?
View raw message