hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Esteban Gutierrez <este...@cloudera.com>
Subject Re: Recovering hbase after a failure
Date Thu, 02 Oct 2014 18:26:37 GMT
Thanks for sharing the details Ron.

Did you move any WAL that might have been created back the the original
.logs directory? Probably if some RSs rolled the WALs then at the time of
the first mv those logs should have been replayed after merging the content
of the original /hbase dir and the content of /hbase during the crash. If
not then you probably have some missing data that needs to be replayed from
those logs.

esteban.


--
Cloudera, Inc.


On Thu, Oct 2, 2014 at 11:17 AM, Buckley,Ron <buckleyr@oclc.org> wrote:

> FWIW, in case something like this happens to someone else.
>
> To recover this, the first thing I tried was to just mv the /hbase
> directory back.   That doesn’t work.
>
> To get back going had to completely shut down and restart.
>
> Also, once the original /hbase got mv'd, a few of the region servers did
> some flush's before they aborted.   Those RS's actually created a new
> /hbase, with new table directories, but only containing the data from the
> flush.
>
>
> -----Original Message-----
> From: Buckley,Ron
> Sent: Thursday, October 02, 2014 2:09 PM
> To: hbase-user
> Subject: RE: Recovering hbase after a failure
>
> Nick,
>
> Good ideas.    Compared  file and region counts with our DR site.   Things
> looks OK.  Going to run some rowcounter's too.
>
> Feels like we got off easy.
>
> Ron
>
> -----Original Message-----
> From: Nick Dimiduk [mailto:ndimiduk@gmail.com]
> Sent: Thursday, October 02, 2014 1:27 PM
> To: hbase-user
> Subject: Re: Recovering hbase after a failure
>
> Hi Ron,
>
> Yikes!
>
> Do you have any basic metrics regarding the amount of data in the system
> -- size of store files before the incident, number of records, &c?
>
> You could sift through the HDFS audit log and see if any files that were
> there previously have not been restored.
>
> -n
>
> On Thu, Oct 2, 2014 at 10:18 AM, Buckley,Ron <buckleyr@oclc.org> wrote:
>
> > We just had an event where, on our main hbase instance, the /hbase
> > directory got moved out from under the running system (Human error).
> >
> > HBase was really unhappy about that, but we were able to recover it
> > fairly easily and get back going.
> >
> > As far as I can tell, all the data and tables came back correct. But,
> > I'm pretty concerned that there may be some hidden corruption or data
> loss.
> >
> > 'hbase hbck'  runs clean and there are no new complaints in the logs.
> >
> > Can anyone think of anything else we should look at?
> >
> >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message