hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From elsif <elsif.t...@gmail.com>
Subject Re: HBase regionserver failure
Date Wed, 30 Sep 2009 22:29:12 GMT
stack wrote:
> On Mon, Sep 28, 2009 at 3:27 PM, elsif <elsif.then@gmail.com> wrote:
>
>   
>> Our HBase system ended up in a looping situation trying to continuously
>> re-assign a damaged region across the HBase cluster. We could not properly
>> scan or store data in the affected table.
>>
>> The triggering event that caused this cascade of errors was an
>> java.io.IOException: Added a key not lexically larger than previous
>>
>>     
>
>
> Here are the offending keys purportedly:
>
> key=^@?/data/dir/e22677afdb73cc17ed82a974058859e5e74b78ca5a7917cceaa500d2dd3198094ecf91792e8ddafc4768cfb4^D11ff79bb955c50b99c8f80fdff0b4beb413d8ea/2009-09-25_034206^Ejson:^@^@^A#?M?)^D,
> lastkey=^@?/data/dir/e22677afdb73cc17ed82a974058859e5e74b78ca5a7917cceaa500d2dd3198094ecf91792e8ddafc4768cfb4d11ff79bb955c50b99c8f80fdff0b4beb413d8ea^Ejson:^@^@^A#?M?#^D
>
>
>
> It seems like keys are fine till we get to '^D'.    Can you make these keys
> or comment on them?  The '^D' is a printable version of whatever the bit of
> binary was here.  Do you have an idea what it was?  Can you remanufacture
> this condition?  Something in our comparator is messing up?  Is that
> possible?
>
>   

The keys are all plain text strings with no special characters.  Not
sure where the '^D' would come from since the same processes is used to
generate all the keys.
> This is in .META. table?
>   

This is from a regular table.
>
>
>
>   
>
> >From the HBase shell "scan '.META.' command we confirmed the name of the
>   
>> damaged encoded
>> region stored in hdfs. In an attempt to fix this, the data directory for
>> the impacted region
>> was moved off hdfs and the region was able to be restarted with a blank
>> slate.
>>
>> Is there a better way to handle this type of failure?
>>
>>
>>     
>
> There is a script that will repair the broke files rewriting them removing
> the offending edit.  I'd point you at the script only its up in an Apache
> JIRA and thats sick at the moment.
>
> You could try running:
>
> ./bin/hbase org.apache.hadoop.hbase.io.hfile.HFile
>
> It has diagnostic and outputting facility.  Pass it the bad files.
>
>
>   
I scanned each of the files with the -k option, no warnings were generated.

I also extracted all the key values from each file - none of them appear
to contain the key with the '^D'. 

The 'key' and 'lastkey' listed above were contained in the
oldlogfile.log.  I opened the oldlogfile.log with a hex editor and
verified that the key does not contain any binary characters where the
'^D' is shown in the error log.  The character is actually a lowercase 'd':

/data/dir/e22677afdb73cc17ed82a974058859e5e74b78ca5a7917cceaa500d2dd3198094ecf91792e8ddafc4768cfb4d11ff79bb955c50b99c8f80fdff0b4beb413d8ea/2009-09-25_034206

It would seem this was a read error of some kind.

>
>   
>> Is there a way to generate an hlog to re-import the data files we moved
>> away?
>>
>>
>>     
>
> Above mentioned script is probably the better way to go.
>
>
>
>   
>> HBase Version: 0.20.0, r805538
>> Hadoop Version: 0.20.0-plus4681, r767961
>>
>>
>>     
> Are these release 0.20.0?
>   
The hadoop is release 0.20.0 - the hbase is a pre-release svn checkout.
> St.Ack
>
>
>
>   


Mime
View raw message