Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7C93C10480 for ; Mon, 9 Feb 2015 23:08:36 +0000 (UTC) Received: (qmail 55897 invoked by uid 500); 9 Feb 2015 23:08:36 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 55855 invoked by uid 500); 9 Feb 2015 23:08:36 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 55842 invoked by uid 99); 9 Feb 2015 23:08:36 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Feb 2015 23:08:36 +0000 Date: Mon, 9 Feb 2015 23:08:36 +0000 (UTC) From: "Jerry He (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-12949) Scanner can be stuck in infinite loop if the HFile is corrupted MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313108#comment-14313108 ] Jerry He commented on HBASE-12949: ---------------------------------- Hi, [~stack] bq. In the KV constructor, it is going to decode sizes anyway as part of the parse? Check at this point rather than apart in the CellUtil I suggested above? I looked through the constructors and methods in KeyValue and their use. It looks the decoding of the bytes for row length, family length, type are done as needed by the upper callers at the time of use. Also it looks we can not do the checking in the getters at the conversion time because they are called on the artificial/fake keys as well. I added a v2 of the patch, which only enhanced the check of keyLength. It is not a complete fix, but it does incremental good :-) and less controversial ? It caught the corruption case I encountered for this JIRA: {noformat} Exception in thread "main" java.lang.IllegalStateException: Invalid currKeyLen 0 or currValueLen 0. Block offset: 2853145836, block length: 65580, position: 29457 (without header). at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:884) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:790) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:136) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:507) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:77) {noformat} > Scanner can be stuck in infinite loop if the HFile is corrupted > --------------------------------------------------------------- > > Key: HBASE-12949 > URL: https://issues.apache.org/jira/browse/HBASE-12949 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.3, 0.98.10 > Reporter: Jerry He > Attachments: HBASE-12949-master-v2.patch, HBASE-12949-master.patch > > > We've encountered problem where compaction hangs and never completes. > After looking into it further, we found that the compaction scanner was stuck in a infinite loop. See stack below. > {noformat} > org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:296) > org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:257) > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:697) > org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:672) > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:529) > org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223) > {noformat} > We identified the hfile that seems to be corrupted. Using HFile tool shows the following: > {noformat} > [biadmin@hdtest009 bin]$ hbase org.apache.hadoop.hbase.io.hfile.HFile -v -k -m -f /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 > 15/01/23 11:53:17 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available > 15/01/23 11:53:18 INFO util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32 > 15/01/23 11:53:18 INFO util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C > 15/01/23 11:53:18 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS > Scanning -> /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 > WARNING, previous row is greater then current row > filename -> /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 > previous -> \x00/20110203-094231205-79442793-1410161293068203000\x0Aattributes16794406\x00\x00\x01\x00\x00\x00\x00\x00\x00 > current -> > Exception in thread "main" java.nio.BufferUnderflowException > at java.nio.Buffer.nextGetIndex(Buffer.java:489) > at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:347) > at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:856) > at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:768) > at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:362) > at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:262) > at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:220) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:539) > at org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:802) > {noformat} > Turning on Java Assert shows the following: > {noformat} > Exception in thread "main" java.lang.AssertionError: Key 20110203-094231205-79442793-1410161293068203000/attributes:16794406/1099511627776/Minimum/vlen=15/mvcc=0 followed by a smaller key //0/Minimum/vlen=0/mvcc=0 in cf attributes > at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:672) > {noformat} > It shows that the hfile seems to be corrupted -- the keys don't seem to be right. > But Scanner is not able to give a meaningful error, but stuck in an infinite loop in here: > {code} > KeyValueHeap.generalizedSeek() > while ((scanner = heap.poll()) != null) { > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)