hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Esteban Gutierrez <este...@cloudera.com>
Subject Re: Recovering hbase after a failure
Date Thu, 02 Oct 2014 18:36:03 GMT
I'm not sure if we should use the deprecated API, calling isDirectory
shouldn't be that expensive in the NN but it will add another RPC call per
flush.

esteban.

--
Cloudera, Inc.


On Thu, Oct 2, 2014 at 11:26 AM, Andrew Purtell <apurtell@apache.org> wrote:

> ​On Thu, Oct 2, 2014 at 11:17 AM, Buckley,Ron <buckleyr@oclc.org> wrote:
>
> > Also, once the original /hbase got mv'd, a few of the region servers did
> > some flush's before they aborted.   Those RS's actually created a new
> > /hbase, with new table directories, but only containing the data from the
> > flush.
>
>
> Sounds like we should be creating flush files with createNonRecursive (even
> though it's deprecated)
>
>
> On Thu, Oct 2, 2014 at 11:17 AM, Buckley,Ron <buckleyr@oclc.org> wrote:
>
> > FWIW, in case something like this happens to someone else.
> >
> > To recover this, the first thing I tried was to just mv the /hbase
> > directory back.   That doesn’t work.
> >
> > To get back going had to completely shut down and restart.
> >
> > Also, once the original /hbase got mv'd, a few of the region servers did
> > some flush's before they aborted.   Those RS's actually created a new
> > /hbase, with new table directories, but only containing the data from the
> > flush.
> >
> >
> > -----Original Message-----
> > From: Buckley,Ron
> > Sent: Thursday, October 02, 2014 2:09 PM
> > To: hbase-user
> > Subject: RE: Recovering hbase after a failure
> >
> > Nick,
> >
> > Good ideas.    Compared  file and region counts with our DR site.
>  Things
> > looks OK.  Going to run some rowcounter's too.
> >
> > Feels like we got off easy.
> >
> > Ron
> >
> > -----Original Message-----
> > From: Nick Dimiduk [mailto:ndimiduk@gmail.com]
> > Sent: Thursday, October 02, 2014 1:27 PM
> > To: hbase-user
> > Subject: Re: Recovering hbase after a failure
> >
> > Hi Ron,
> >
> > Yikes!
> >
> > Do you have any basic metrics regarding the amount of data in the system
> > -- size of store files before the incident, number of records, &c?
> >
> > You could sift through the HDFS audit log and see if any files that were
> > there previously have not been restored.
> >
> > -n
> >
> > On Thu, Oct 2, 2014 at 10:18 AM, Buckley,Ron <buckleyr@oclc.org> wrote:
> >
> > > We just had an event where, on our main hbase instance, the /hbase
> > > directory got moved out from under the running system (Human error).
> > >
> > > HBase was really unhappy about that, but we were able to recover it
> > > fairly easily and get back going.
> > >
> > > As far as I can tell, all the data and tables came back correct. But,
> > > I'm pretty concerned that there may be some hidden corruption or data
> > loss.
> > >
> > > 'hbase hbck'  runs clean and there are no new complaints in the logs.
> > >
> > > Can anyone think of anything else we should look at?
> > >
> > >
> > >
> > >
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message