hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <saint....@gmail.com>
Subject Re: HBase regionserver failure
Date Thu, 01 Oct 2009 00:33:42 GMT
Can you make an issue and post the offending old logfile plus snippet  
from regionserver log leading up to the exception?

What if you put the log back in place?  Can you make the exceoption  
happen again?

Thanks



On Sep 30, 2009, at 3:29 PM, elsif <elsif.then@gmail.com> wrote:

> stack wrote:
>> On Mon, Sep 28, 2009 at 3:27 PM, elsif <elsif.then@gmail.com> wrote:
>>
>>
>>> Our HBase system ended up in a looping situation trying to  
>>> continuously
>>> re-assign a damaged region across the HBase cluster. We could not  
>>> properly
>>> scan or store data in the affected table.
>>>
>>> The triggering event that caused this cascade of errors was an
>>> java.io.IOException: Added a key not lexically larger than previous
>>>
>>>
>>
>>
>> Here are the offending keys purportedly:
>>
>> key=^@?/data/dir/ 
>> e22677afdb73cc17ed82a974058859e5e74b78ca5a7917cceaa500d2dd3198094ecf91792e8ddafc4768cfb4^

>> D11ff79bb955c50b99c8f80fdff0b4beb413d8ea/ 
>> 2009-09-25_034206^Ejson:^@^@^A#?M?)^D,
>> lastkey=^@?/data/dir/ 
>> e22677afdb73cc17ed82a974058859e5e74b78ca5a7917cceaa500d2dd3198094ecf91792e8ddafc4768cfb4d11ff79bb955c50b99c8f80fdff0b4beb413d8ea^

>> Ejson:^@^@^A#?M?#^D
>>
>>
>>
>> It seems like keys are fine till we get to '^D'.    Can you make  
>> these keys
>> or comment on them?  The '^D' is a printable version of whatever  
>> the bit of
>> binary was here.  Do you have an idea what it was?  Can you  
>> remanufacture
>> this condition?  Something in our comparator is messing up?  Is that
>> possible?
>>
>>
>
> The keys are all plain text strings with no special characters.  Not
> sure where the '^D' would come from since the same processes is used  
> to
> generate all the keys.
>> This is in .META. table?
>>
>
> This is from a regular table.
>>
>>
>>
>>
>>
>>> From the HBase shell "scan '.META.' command we confirmed the name  
>>> of the
>>
>>> damaged encoded
>>> region stored in hdfs. In an attempt to fix this, the data  
>>> directory for
>>> the impacted region
>>> was moved off hdfs and the region was able to be restarted with a  
>>> blank
>>> slate.
>>>
>>> Is there a better way to handle this type of failure?
>>>
>>>
>>>
>>
>> There is a script that will repair the broke files rewriting them  
>> removing
>> the offending edit.  I'd point you at the script only its up in an  
>> Apache
>> JIRA and thats sick at the moment.
>>
>> You could try running:
>>
>> ./bin/hbase org.apache.hadoop.hbase.io.hfile.HFile
>>
>> It has diagnostic and outputting facility.  Pass it the bad files.
>>
>>
>>
> I scanned each of the files with the -k option, no warnings were  
> generated.
>
> I also extracted all the key values from each file - none of them  
> appear
> to contain the key with the '^D'.
>
> The 'key' and 'lastkey' listed above were contained in the
> oldlogfile.log.  I opened the oldlogfile.log with a hex editor and
> verified that the key does not contain any binary characters where the
> '^D' is shown in the error log.  The character is actually a  
> lowercase 'd':
>
> /data/dir/ 
> e22677afdb73cc17ed82a974058859e5e74b78ca5a7917cceaa500d2dd3198094ecf91792e8ddafc4768cfb4d11ff79bb955c50b99c8f80fdff0b4beb413d8ea/

> 2009-09-25_034206
>
> It would seem this was a read error of some kind.
>
>>
>>
>>> Is there a way to generate an hlog to re-import the data files we  
>>> moved
>>> away?
>>>
>>>
>>>
>>
>> Above mentioned script is probably the better way to go.
>>
>>
>>
>>
>>> HBase Version: 0.20.0, r805538
>>> Hadoop Version: 0.20.0-plus4681, r767961
>>>
>>>
>>>
>> Are these release 0.20.0?
>>
> The hadoop is release 0.20.0 - the hbase is a pre-release svn  
> checkout.
>> St.Ack
>>
>>
>>
>>
>

Mime
View raw message