hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Asim Zafir <maza...@gmail.com>
Subject Re: [jira] [Commented] (HBASE-12949) Scanner can be stuck in infinite loop if the HFile is corrupted
Date Thu, 05 Feb 2015 20:47:12 GMT
Jerry 

What is the quick and easy way to monitor corrupted hfiles ? We are using HBASE 0.98 

Thanks 

Asim



> On Feb 5, 2015, at 10:47 AM, Jerry He (JIRA) <jira@apache.org> wrote:
> 
> 
>    [ https://issues.apache.org/jira/browse/HBASE-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307748#comment-14307748
] 
> 
> Jerry He commented on HBASE-12949:
> ----------------------------------
> 
> Hi, [~stack], [~ram_krish]
> I agree.
> It is a balancing act between checking, not checking, and checking more or checking less.
> We can check less, for example, only the type. 
> 
> Another option I can think of is that we have a property (say 'SanityCheckCell').  
> We will only do checking when reading the cells if the property is set to true, 
> for people indicating strong cell sanity check, or for people lacking strong FileSystem
protection (checksum, etc).
> 
> What do you think? 
> 
>> Scanner can be stuck in infinite loop if the HFile is corrupted
>> ---------------------------------------------------------------
>> 
>>                Key: HBASE-12949
>>                URL: https://issues.apache.org/jira/browse/HBASE-12949
>>            Project: HBase
>>         Issue Type: Bug
>>   Affects Versions: 0.94.3, 0.98.10
>>           Reporter: Jerry He
>>        Attachments: HBASE-12949-master.patch
>> 
>> 
>> We've encountered problem where compaction hangs and never completes.
>> After looking into it further, we found that the compaction scanner was stuck in
a infinite loop. See stack below.
>> {noformat}
>> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:296)
>> org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:257)
>> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:697)
>> org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:672)
>> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:529)
>> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223)
>> {noformat}
>> We identified the hfile that seems to be corrupted.  Using HFile tool shows the following:
>> {noformat}
>> [biadmin@hdtest009 bin]$ hbase org.apache.hadoop.hbase.io.hfile.HFile -v -k -m -f
/user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7
>> 15/01/23 11:53:17 INFO Configuration.deprecation: hadoop.native.lib is deprecated.
Instead, use io.native.lib.available
>> 15/01/23 11:53:18 INFO util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32
>> 15/01/23 11:53:18 INFO util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C
>> 15/01/23 11:53:18 INFO Configuration.deprecation: fs.default.name is deprecated.
Instead, use fs.defaultFS
>> Scanning -> /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7
>> WARNING, previous row is greater then current row
>>        filename -> /user/biadmin/CUMMINS_INSITE_V1/7106432d294dd844be15996ccbf2ba84/attributes/f1a7e3113c2c4047ac1fc8fbcb41d8b7
>>        previous -> \x00/20110203-094231205-79442793-1410161293068203000\x0Aattributes16794406\x00\x00\x01\x00\x00\x00\x00\x00\x00
>>        current  ->
>> Exception in thread "main" java.nio.BufferUnderflowException
>>        at java.nio.Buffer.nextGetIndex(Buffer.java:489)
>>        at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:347)
>>        at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:856)
>>        at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:768)
>>        at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:362)
>>        at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:262)
>>        at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:220)
>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>        at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:539)
>>        at org.apache.hadoop.hbase.io.hfile.HFile.main(HFile.java:802)
>> {noformat}
>> Turning on Java Assert shows the following:
>> {noformat}
>> Exception in thread "main" java.lang.AssertionError: Key 20110203-094231205-79442793-1410161293068203000/attributes:16794406/1099511627776/Minimum/vlen=15/mvcc=0
followed by a smaller key //0/Minimum/vlen=0/mvcc=0 in cf attributes
>>        at org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:672)
>> {noformat}
>> It shows that the hfile seems to be corrupted -- the keys don't seem to be right.
>> But Scanner is not able to give a meaningful error, but stuck in an infinite loop
in here:
>> {code}
>> KeyValueHeap.generalizedSeek()
>> while ((scanner = heap.poll()) != null) {
>> }
>> {code}
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)

Mime
View raw message