hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerry He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12949) Scanner can be stuck in infinite loop if the HFile is corrupted
Date Wed, 25 Feb 2015 21:15:05 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14337221#comment-14337221
] 

Jerry He commented on HBASE-12949:
----------------------------------

Hi, [~stack]

Thanks for getting back to this.
BufferUnderflowException and IllegalStateException are both subclass of RuntimeException.
Unchecked, no better, no worse.  At least we are using  IllegalStateException currently.

Also, BufferUnderflowException is where we can not even read the length. The other is where
we can read the length, but it is bad. 
Here is the case. 
Two KVs:  <first-kv-we-can-read-the-length-but-the-length-is-bad><next-kv-we-can-not-even-read-the-length>
With the patch, we fail at the first bad kv.
Without the patch: 
1. we can stuck in the first bad kv. 
2. BufferUnderflowException in the next kv. This is what happened in the HFilePrettyPrinter,
which uses a ad hoc scanner.

> Scanner can be stuck in infinite loop if the HFile is corrupted
> ---------------------------------------------------------------
>
>                 Key: HBASE-12949
>                 URL: https://issues.apache.org/jira/browse/HBASE-12949
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.3, 0.98.10
>            Reporter: Jerry He
>         Attachments: HBASE-12949-master-v2 (1).patch, HBASE-12949-master-v2.patch, HBASE-12949-master-v2.patch,
HBASE-12949-master.patch
>
>
> We've encountered problem where compaction hangs and never completes.
> After looking into it further, we found that the compaction scanner was stuck in a infinite
loop. See stack below.
> {noformat}
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:296)
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:257)
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:697)
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:672)
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:529)
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223)
> {noformat}
> We identified the hfile that seems to be corrupted.  Using HFile tool shows the following:
> {noformat}
> [biadmin@hdtest009 bin]$ hbase org.apache.hadoop.hbase.io.hfile.HFile -v -k -m -f /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7
> 15/01/23 11:53:17 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead,
use io.native.lib.available
> 15/01/23 11:53:18 INFO util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32
> 15/01/23 11:53:18 INFO util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C
> 15/01/23 11:53:18 INFO Configuration.deprecation: fs.default.name is deprecated. Instead,
use fs.defaultFS
> Scanning -> /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7
> WARNING, previous row is greater then current row
>         filename -> /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7
>         previous -> \x00/20110203-094231205-79442793-1410161293068203000\x0Aattributes16794406\x00\x00\x01\x00\x00\x00\x00\x00\x00
>         current  ->
> Exception in thread "main" java.nio.BufferUnderflowException
>         at java.nio.Buffer.nextGetIndex(Buffer.java:489)
>         at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:347)
>         at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:856)
>         at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:768)
>         at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:362)
>         at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:262)
>         at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:220)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:539)
>         at org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:802)
> {noformat}
> Turning on Java Assert shows the following:
> {noformat}
> Exception in thread "main" java.lang.AssertionError: Key 20110203-094231205-79442793-1410161293068203000/attributes:16794406/1099511627776/Minimum/vlen=15/mvcc=0
followed by a smaller key //0/Minimum/vlen=0/mvcc=0 in cf attributes
>         at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:672)
> {noformat}
> It shows that the hfile seems to be corrupted -- the keys don't seem to be right.
> But Scanner is not able to give a meaningful error, but stuck in an infinite loop in
here:
> {code}
> KeyValueHeap.generalizedSeek()
> while ((scanner = heap.poll()) != null) {
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message