hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Recovering hbase after a failure
Date Thu, 02 Oct 2014 21:17:22 GMT
14 if you count createNewFile :-)

http://search-hadoop.com/m/282AcZLDAp1. Maybe you could tap Andrew or Colin
on the shoulder Esteban?


On Thu, Oct 2, 2014 at 2:13 PM, Andrew Purtell <apurtell@apache.org> wrote:

> It's not the round trip, it's the atomicity of the operation. Consider a
> rename happening between the isDirectory call and the subsequent create
> call. What would you have achieved by introducing the isDirectory check? I
> skimmed the FileSystem javadoc for 2.4.1 and none of the 13 non-deprecated
> create methods can provide the same semantics of createNonRecursive, shame.
>
>
> On Thu, Oct 2, 2014 at 11:36 AM, Esteban Gutierrez <esteban@cloudera.com>
> wrote:
>
>> I'm not sure if we should use the deprecated API, calling isDirectory
>> shouldn't be that expensive in the NN but it will add another RPC call per
>> flush.
>>
>> esteban.
>>
>> --
>> Cloudera, Inc.
>>
>>
>> On Thu, Oct 2, 2014 at 11:26 AM, Andrew Purtell <apurtell@apache.org>
>> wrote:
>>
>> > ​On Thu, Oct 2, 2014 at 11:17 AM, Buckley,Ron <buckleyr@oclc.org>
>> wrote:
>> >
>> > > Also, once the original /hbase got mv'd, a few of the region servers
>> did
>> > > some flush's before they aborted.   Those RS's actually created a new
>> > > /hbase, with new table directories, but only containing the data from
>> the
>> > > flush.
>> >
>> >
>> > Sounds like we should be creating flush files with createNonRecursive
>> (even
>> > though it's deprecated)
>> >
>> >
>> > On Thu, Oct 2, 2014 at 11:17 AM, Buckley,Ron <buckleyr@oclc.org> wrote:
>> >
>> > > FWIW, in case something like this happens to someone else.
>> > >
>> > > To recover this, the first thing I tried was to just mv the /hbase
>> > > directory back.   That doesn’t work.
>> > >
>> > > To get back going had to completely shut down and restart.
>> > >
>> > > Also, once the original /hbase got mv'd, a few of the region servers
>> did
>> > > some flush's before they aborted.   Those RS's actually created a new
>> > > /hbase, with new table directories, but only containing the data from
>> the
>> > > flush.
>> > >
>> > >
>> > > -----Original Message-----
>> > > From: Buckley,Ron
>> > > Sent: Thursday, October 02, 2014 2:09 PM
>> > > To: hbase-user
>> > > Subject: RE: Recovering hbase after a failure
>> > >
>> > > Nick,
>> > >
>> > > Good ideas.    Compared  file and region counts with our DR site.
>> >  Things
>> > > looks OK.  Going to run some rowcounter's too.
>> > >
>> > > Feels like we got off easy.
>> > >
>> > > Ron
>> > >
>> > > -----Original Message-----
>> > > From: Nick Dimiduk [mailto:ndimiduk@gmail.com]
>> > > Sent: Thursday, October 02, 2014 1:27 PM
>> > > To: hbase-user
>> > > Subject: Re: Recovering hbase after a failure
>> > >
>> > > Hi Ron,
>> > >
>> > > Yikes!
>> > >
>> > > Do you have any basic metrics regarding the amount of data in the
>> system
>> > > -- size of store files before the incident, number of records, &c?
>> > >
>> > > You could sift through the HDFS audit log and see if any files that
>> were
>> > > there previously have not been restored.
>> > >
>> > > -n
>> > >
>> > > On Thu, Oct 2, 2014 at 10:18 AM, Buckley,Ron <buckleyr@oclc.org>
>> wrote:
>> > >
>> > > > We just had an event where, on our main hbase instance, the /hbase
>> > > > directory got moved out from under the running system (Human error).
>> > > >
>> > > > HBase was really unhappy about that, but we were able to recover it
>> > > > fairly easily and get back going.
>> > > >
>> > > > As far as I can tell, all the data and tables came back correct.
>> But,
>> > > > I'm pretty concerned that there may be some hidden corruption or
>> data
>> > > loss.
>> > > >
>> > > > 'hbase hbck'  runs clean and there are no new complaints in the
>> logs.
>> > > >
>> > > > Can anyone think of anything else we should look at?
>> >
>>
>

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message