hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6586) Quarantine Corrupted HFiles
Date Thu, 16 Aug 2012 17:05:38 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436106#comment-13436106

Jonathan Hsieh commented on HBASE-6586:

Looks like if we go with the hbck approach instead of the autofix approach, we'll should to
add yet another state for a region which I'll call something like QUARANTINED.  Currently,
corrupted HFiles cause a region to go into RS_ZK_REGION_FAILED_OPEN, which eventually gets
transitioned to M_ZK_REGION_OFFLINE which then transitions to RS_ZK_REGION_OPENING triggering
another attempt to open the region (which fails and ...).  Ideally when we have a none-recoverable
failure like corrupted hfiles, we'd transition to QUARANTINED instead of FAILED_OPEN and stay
there until an admin fixes the problem.

> Quarantine Corrupted HFiles
> ---------------------------
>                 Key: HBASE-6586
>                 URL: https://issues.apache.org/jira/browse/HBASE-6586
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
> We've encountered a few upgrades from 0.90 hbases + 20.2/1.x hdfs to 0.92 hbases + hdfs
2.x that get stuck.  I haven't been able to duplicate the problem in my dev environment but
we suspect this may be related to HDFS-3731.  On the HBase side, it seems reasonable to quarantine
what are most likely truncated hfiles, so that can could later be recovered.
> Here's an example of the exception we've encountered:
> {code}
> 2012-07-18 05:55:01,152 ERROR handler.OpenRegionHandler (OpenRegionHandler.java:openRegion(346))
- Failed open of region=user_mappings,080112102AA76EF98197605D341B9E6C5824D2BC|1001,1317824890618.eaed0e7abc6d27d28ff0e5a9b49c4c
> 0d. 
> java.io.IOException: java.lang.IllegalArgumentException: Invalid HFile version: 842220600
(expected to be between 1 and 2) 
> at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:306)

> at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:371) 
> at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:387) 
> at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.<init>(StoreFile.java:1026)

> at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:485) 
> at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:566) 
> at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:286) 
> at org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:223) 
> at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2534)

> at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:454) 
> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3282) 
> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3230) 
> at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:331)
> at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:107)
> at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
> at java.lang.Thread.run(Thread.java:619) 
> Caused by: java.lang.IllegalArgumentException: Invalid HFile version: 842220600 (expected
to be between 1 and 2) 
> at org.apache.hadoop.hbase.io.hfile.HFile.checkFormatVersion(HFile.java:515) 
> at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:303)

> ... 17 more
> {code}
> Specifically -- the FixedFileTrailer are incorrect, and seemingly missing.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message