Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id CC0E3200B33 for ; Tue, 14 Jun 2016 18:48:59 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id CAC07160A47; Tue, 14 Jun 2016 16:48:59 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1AFFD160A62 for ; Tue, 14 Jun 2016 18:48:58 +0200 (CEST) Received: (qmail 50964 invoked by uid 500); 14 Jun 2016 16:48:57 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 50934 invoked by uid 99); 14 Jun 2016 16:48:57 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jun 2016 16:48:57 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 6BD7B2C1F6F for ; Tue, 14 Jun 2016 16:48:57 +0000 (UTC) Date: Tue, 14 Jun 2016 16:48:57 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-12949) Scanner can be stuck in infinite loop if the HFile is corrupted MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 14 Jun 2016 16:49:00 -0000 [ https://issues.apache.org/jira/browse/HBASE-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-12949: -------------------------- Attachment: HBASE-12949-master-v2.patch Retry. We never committed this [~jerryhe]? > Scanner can be stuck in infinite loop if the HFile is corrupted > --------------------------------------------------------------- > > Key: HBASE-12949 > URL: https://issues.apache.org/jira/browse/HBASE-12949 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.3, 0.98.10 > Reporter: Jerry He > Attachments: HBASE-12949-master-v2 (1).patch, HBASE-12949-master-v2.patch, HBASE-12949-master-v2.patch, HBASE-12949-master-v2.patch, HBASE-12949-master.patch > > > We've encountered problem where compaction hangs and never completes. > After looking into it further, we found that the compaction scanner was stuck in a infinite loop. See stack below. > {noformat} > org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:296) > org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:257) > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:697) > org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:672) > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:529) > org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223) > {noformat} > We identified the hfile that seems to be corrupted. Using HFile tool shows the following: > {noformat} > [biadmin@hdtest009 bin]$ hbase org.apache.hadoop.hbase.io.hfile.HFile -v -k -m -f /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 > 15/01/23 11:53:17 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available > 15/01/23 11:53:18 INFO util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32 > 15/01/23 11:53:18 INFO util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C > 15/01/23 11:53:18 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS > Scanning -> /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 > WARNING, previous row is greater then current row > filename -> /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7 > previous -> \x00/20110203-094231205-79442793-1410161293068203000\x0Aattributes16794406\x00\x00\x01\x00\x00\x00\x00\x00\x00 > current -> > Exception in thread "main" java.nio.BufferUnderflowException > at java.nio.Buffer.nextGetIndex(Buffer.java:489) > at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:347) > at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:856) > at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:768) > at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:362) > at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:262) > at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:220) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:539) > at org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:802) > {noformat} > Turning on Java Assert shows the following: > {noformat} > Exception in thread "main" java.lang.AssertionError: Key 20110203-094231205-79442793-1410161293068203000/attributes:16794406/1099511627776/Minimum/vlen=15/mvcc=0 followed by a smaller key //0/Minimum/vlen=0/mvcc=0 in cf attributes > at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:672) > {noformat} > It shows that the hfile seems to be corrupted -- the keys don't seem to be right. > But Scanner is not able to give a meaningful error, but stuck in an infinite loop in here: > {code} > KeyValueHeap.generalizedSeek() > while ((scanner = heap.poll()) != null) { > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)