hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From elsif <elsif.t...@gmail.com>
Subject Re: Table recovery options
Date Thu, 24 Sep 2009 21:41:33 GMT
Please see comments inline.

stack wrote:
> On Wed, Sep 23, 2009 at 4:15 PM, elsif <elsif.then@gmail.com> wrote:
>
>   
>> We have a couple clusters running with lzo compression.  When testing
>> the new 0.20.1 release
>>     
>
>
> You mean hadoop's new 0.20.1 release?
>
>   

This is with the hadoop 0.20.1 release and the hbase 0.20 branch which
results in a hbase-0.20.1-dev.jar.

>> I setup a single node cluster and reused the
>> compression jar and native libraries from the 0.20.0 release.
>>     
>
>
>
> hadoop 0.20.0 release?
>
> What release of hbase are you using?
>
>
>
>   
>> The
>> following session log shows a table being created with the lzo option
>> and some rows being added.  After hbase is restarted the table is no
>> longer accessible - the region server crashed during the flush operation
>> due to a SIGFPE.
>>
>> The flush that was done during shutdown?  The flush of the .META.?  If this
>>     
> failed, then state of .META. would not have been persisted and yes, you
> would have lost your table.
>
>
>
>   
>> Would it be possible to add a check to verify the compression feature
>> before it is used in a table to avoid corruption?  A simple shell or cli
>> option would be great.
>>
>>     
>
> Sounds like a good idea.  What would you suggest?  You could force a flush
> on a table with data on it and check if it worked or not?
>
>
>
>   

A flush would still cause data loss in this scenario as the region
server crashes from the library mismatch.  A standalone cli check that
could be run on each region server node after an install or upgrade but
before staring any of the hbase daemons would be better - that way no
data is in jeopardy.  I will code something up and submit it back to the
list.

>> In general, once hbase tables are corrupted is there anyway to repair
>> them?  - In this test case the table is never written to disk.
>>
>>
>>     
> Depends on the 'corruption'.  Generally yes, there are ways.  Below it seems
> a compression library mismatch is preventing hbase writing the filesystem.
> Can you fix this and retry?
>
>
>   

Fixing the compression library allows new tables to work cleanly.  The
original table remains corrupted which is understandable.

>   
>> Is it possible to regenerate an hbase table from the data files stored
>> in hdfs?
>>
>>     
>
>
> Yes. You'd have to write a script.   In hdfs, under each region directory
> there is a file named .regioninfo.  It has the content of the .META. table
> for this region serialized.  A script could look at some subset of the
> regions on disk -- say all that make up a table and do fix up of .META.  On
> the next scan of .META. the table should be onlined.  Let me know if you'd
> like some help with this and we can work on it together.
>
>
>
>   

That would be great.  Do you have any samples or psudo-code for the
operation? Is there any documentation on the specific file contents?

>> Are there any preventative measures we can take to make it easier to
>> roll back to a valid state?
>>
>>     
>
> You can backup hbase content if its small, or you can scan content from
> before the date at which invalid data shows.  What else would you like?
>   

Is there any benefit in storing snapshots of the .regioninfo. file?
Guessing the table would have to be disabled during the copy?

It would be nice if there was a way to verify the health of a table and
report on any inconsistencies.

>  St.Ack
>
>   



Mime
View raw message