hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerry He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12949) Scanner can be stuck in infinite loop if the HFile is corrupted
Date Mon, 09 Feb 2015 23:08:36 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313108#comment-14313108

Jerry He commented on HBASE-12949:

Hi, [~stack]

bq. In the KV constructor, it is going to decode sizes anyway as part of the parse? Check
at this point rather than apart in the CellUtil I suggested above?
I looked through the constructors and methods in KeyValue and their use.  It looks the decoding
of the bytes for row length, family length, type are done as needed by the upper callers at
the time of use.  Also it looks we can not do the checking in the getters at the conversion
time because they are called on the artificial/fake keys as well.

I added a v2 of the patch, which only enhanced the check of keyLength. It is not a complete
fix, but it does incremental good :-)  and less controversial ?
It caught the corruption case I encountered for this JIRA:

Exception in thread "main" java.lang.IllegalStateException: Invalid currKeyLen 0 or currValueLen
0. Block offset: 2853145836, block length: 65580, position: 29457 (without header).
        at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:884)
        at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:790)
        at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:136)
        at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108)
        at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:507)
        at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223)
        at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:77)

> Scanner can be stuck in infinite loop if the HFile is corrupted
> ---------------------------------------------------------------
>                 Key: HBASE-12949
>                 URL: https://issues.apache.org/jira/browse/HBASE-12949
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.3, 0.98.10
>            Reporter: Jerry He
>         Attachments: HBASE-12949-master-v2.patch, HBASE-12949-master.patch
> We've encountered problem where compaction hangs and never completes.
> After looking into it further, we found that the compaction scanner was stuck in a infinite
loop. See stack below.
> {noformat}
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:296)
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:257)
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:697)
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:672)
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:529)
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223)
> {noformat}
> We identified the hfile that seems to be corrupted.  Using HFile tool shows the following:
> {noformat}
> [biadmin@hdtest009 bin]$ hbase org.apache.hadoop.hbase.io.hfile.HFile -v -k -m -f /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7
> 15/01/23 11:53:17 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead,
use io.native.lib.available
> 15/01/23 11:53:18 INFO util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32
> 15/01/23 11:53:18 INFO util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C
> 15/01/23 11:53:18 INFO Configuration.deprecation: fs.default.name is deprecated. Instead,
use fs.defaultFS
> Scanning -> /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7
> WARNING, previous row is greater then current row
>         filename -> /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7
>         previous -> \x00/20110203-094231205-79442793-1410161293068203000\x0Aattributes16794406\x00\x00\x01\x00\x00\x00\x00\x00\x00
>         current  ->
> Exception in thread "main" java.nio.BufferUnderflowException
>         at java.nio.Buffer.nextGetIndex(Buffer.java:489)
>         at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:347)
>         at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:856)
>         at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:768)
>         at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:362)
>         at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:262)
>         at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:220)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:539)
>         at org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:802)
> {noformat}
> Turning on Java Assert shows the following:
> {noformat}
> Exception in thread "main" java.lang.AssertionError: Key 20110203-094231205-79442793-1410161293068203000/attributes:16794406/1099511627776/Minimum/vlen=15/mvcc=0
followed by a smaller key //0/Minimum/vlen=0/mvcc=0 in cf attributes
>         at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:672)
> {noformat}
> It shows that the hfile seems to be corrupted -- the keys don't seem to be right.
> But Scanner is not able to give a meaningful error, but stuck in an infinite loop in
> {code}
> KeyValueHeap.generalizedSeek()
> while ((scanner = heap.poll()) != null) {
> }
> {code}

This message was sent by Atlassian JIRA

View raw message