hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Victor Xu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-9646) SplitLogWorker throws IndexOutOfBoundsException while reading compressed hlog
Date Tue, 24 Sep 2013 14:40:03 GMT
Victor Xu created HBASE-9646:
--------------------------------

             Summary: SplitLogWorker throws IndexOutOfBoundsException while reading compressed
hlog
                 Key: HBASE-9646
                 URL: https://issues.apache.org/jira/browse/HBASE-9646
             Project: HBase
          Issue Type: Bug
          Components: wal
    Affects Versions: 0.94.10
         Environment: Linux 2.6.32-el5.x86_64
            Reporter: Victor Xu


This case just happened after a regionserver halt.([HBASE-9645|https://issues.apache.org/jira/browse/HBASE-9645])

As we know, when a RS was down, the master would direct other live RS to split the hlog remains
of the dead RS. Then, after the whole splitting, it reassign the regions to other live RS.


The case I described here is that one of the hlog file of the dead RS cannot be splitted.
This hlog splitting task had been resubmitted by master several times to different RS, but
all of them failed.

I checked one of the RS's regionserver log, it had some read error:
{noformat}
2013-09-19 15:54:53,082 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker hadoop250.cm6,60020,1378886880990
acquired task /hbase/splitlog/hdfs%3A%2F%2Fhadoopnnvip.cm6%3A9000%2Fhbase%2F.logs
%2Fhadoop230.cm6%2C60020%2C1378886880970-splitting%2Fhadoop230.cm6%252C60020%252C1378886880970.1379572773730
2013-09-19 15:54:53,083 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting
hlog: hdfs://hadoopnnvip.cm6:9000/hbase/.logs/hadoop230.cm6,60020,1378886880970-splitting/hadoop230.cm6%2C6002
0%2C1378886880970.1379572773730, length=1169560315
2013-09-19 15:54:53,083 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: Recovering file hdfs://hadoopnnvip.cm6:9000/hbase/.logs/hadoop230.cm6,60020,1378886880970-splitting/hadoop230.cm6%2C60020%2C137888688
0970.1379572773730
2013-09-19 15:54:53,084 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: Finished lease recovery
attempt for hdfs://hadoopnnvip.cm6:9000/hbase/.logs/hadoop230.cm6,60020,1378886880970-splitting/hadoop230.cm6
%2C60020%2C1378886880970.1379572773730
2013-09-19 15:54:53,166 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found
existing old edits file. It could be the result of a previous failed split attempt. Deleting
hdfs://hadoopnnvip.cm
6:9000/hbase/page_content_queue/1af5b36fd3695e18ef85ea6d768a3e45/recovered.edits/0000000080485091467,
length=71240
2013-09-19 15:54:53,177 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found
existing old edits file. It could be the result of a previous failed split attempt. Deleting
hdfs://hadoopnnvip.cm
6:9000/hbase/offline_dashboard_queue/2514b6f61f05a2616f1e3dd893a56da6/recovered.edits/0000000080485091468,
length=1242
2013-09-19 15:54:53,188 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found
existing old edits file. It could be the result of a previous failed split attempt. Deleting
hdfs://hadoopnnvip.cm
6:9000/hbase/taobao_auction_rowkey_queue/45ba2b931f94d4d453977ea16bccda6b/recovered.edits/0000000080485091472,
length=704
2013-09-19 15:54:53,204 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found
existing old edits file. It could be the result of a previous failed split attempt. Deleting
hdfs://hadoopnnvip.cm
6:9000/hbase/shangpin_alt_taoke_doc_queue/aabc3dc0ab1c9c1ead1239b6be4234ed/recovered.edits/0000000080485091471,
length=8998
2013-09-19 15:54:53,216 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found
existing old edits file. It could be the result of a previous failed split attempt. Deleting
hdfs://hadoopnnvip.cm
6:9000/hbase/link_rowkey/eb004b12a0e1d3f467cadb19a53ca29e/recovered.edits/0000000080485091462,
length=612
2013-09-19 15:54:53,219 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Processed
13 edits across 5 regions threw away edits for 0 regions; log file=hdfs://hadoopnnvip.cm6:9000/hbase/.logs/hadoop230.cm6,60020,1378886880970-splitting/hadoop230.cm6%2C60020%2C1378886880970.1379572773730
is corrupted = false progress failed = false
2013-09-19 15:54:53,219 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker hadoop250.cm6,60020,1378886880990
done with task /hbase/splitlog/hdfs%3A%2F%2Fhadoopnnvip.cm6%3A9000%2Fhbase%2F.logs%2Fhadoop230.cm6%2C60020%2C1378886880970-splitting%2Fhadoop230.cm6%252C60020%252C1378886880970.1379572773730
in 137ms
2013-09-19 15:54:53,219 ERROR org.apache.hadoop.hbase.regionserver.SplitLogWorker: unexpected
error 
java.lang.IndexOutOfBoundsException: index (5) must be less than size (5)
        at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:301)
        at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:280)
        at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.get(LRUDictionary.java:122)
        at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.access$000(LRUDictionary.java:69)
        at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary.getEntry(LRUDictionary.java:40)
        at org.apache.hadoop.hbase.regionserver.wal.Compressor.readCompressed(Compressor.java:111)
        at org.apache.hadoop.hbase.regionserver.wal.HLogKey.readFields(HLogKey.java:307)
        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2199)
        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2239)
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:245)
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:214)
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:813)
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:437)
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:112)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:211)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:179)
        at java.lang.Thread.run(Thread.java:662)
2013-09-19 15:54:53,219 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker
hadoop250.cm6,60020,1378886880990 exiting
{noformat}

I then used the HLog dump tool to analyze the hlog file. Here is what I got:
{noformat}
$ hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump /user/hadoop/hbase_admin/logs/hadoop230.cm6%2C60020%2C1378886880970.1379572773730
13/09/24 14:24:38 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
Sequence 80485091459 from region aabc3dc0ab1c9c1ead1239b6be4234ed in table shangpin_alt_taoke_doc_queue
  Action:
    row: 00015:0000001379572773802:00001
    column: message:789
    at time: Thu Sep 19 14:39:33 CST 2013
Sequence 80485091460 from region aabc3dc0ab1c9c1ead1239b6be4234ed in table shangpin_alt_taoke_doc_queue
  Action:
    row: 00015:0000001379572773799:00001
    column: message:789
    at time: Thu Sep 19 14:39:33 CST 2013
Sequence 80485091461 from region eb004b12a0e1d3f467cadb19a53ca29e in table link_rowkey
  Action:
    row: 191:1379572773794
    column: link_crawler:15acca08d58e374208eb603aaa4c58e8_1379572773791_345445
    at time: Thu Sep 19 14:39:33 CST 2013
Sequence 80485091462 from region eb004b12a0e1d3f467cadb19a53ca29e in table link_rowkey
  Action:
    row: 191:1379572773793
    column: link_crawler:f8a52947d7a0e794204f118fc7c5fb5a_1379572773791_916104
    at time: Thu Sep 19 14:39:33 CST 2013
Sequence 80485091463 from region 1af5b36fd3695e18ef85ea6d768a3e45 in table page_content_queue
  Action:
    row: 00039:0000001379572773776:00001
    column: message:content
    at time: Thu Sep 19 14:39:33 CST 2013
Sequence 80485091464 from region 2514b6f61f05a2616f1e3dd893a56da6 in table offline_dashboard_queue
  Action:
    row: 00082:0000001379572773772:00001
    column: message:parser
    at time: Thu Sep 19 14:39:33 CST 2013
Sequence 80485091465 from region 2514b6f61f05a2616f1e3dd893a56da6 in table offline_dashboard_queue
  Action:
    row: 00082:0000001379572773770:00001
    column: message:parser
    at time: Thu Sep 19 14:39:33 CST 2013
Sequence 80485091466 from region 45ba2b931f94d4d453977ea16bccda6b in table taobao_auction_rowkey_queue
  Action:
    row: 00095:0000001379572773768:00001
    column: message:taobao_auction
    at time: Thu Sep 19 14:39:33 CST 2013
  Action:
    row: 00095:0000001379572773768:00002
    column: message:taobao_auction
    at time: Thu Sep 19 14:39:33 CST 2013
  Action:
    row: 00095:0000001379572773768:00003
    column: message:taobao_auction
    at time: Thu Sep 19 14:39:33 CST 2013
Sequence 80485091467 from region 1af5b36fd3695e18ef85ea6d768a3e45 in table page_content_queue
  Action:
    row: 00039:0000001379572773766:00001
    column: message:content
    at time: Thu Sep 19 14:39:33 CST 2013
Sequence 80485091468 from region 2514b6f61f05a2616f1e3dd893a56da6 in table offline_dashboard_queue
  Action:
    row: 00082:0000001379572773759:00001
    column: message:parser
    at time: Thu Sep 19 14:39:33 CST 2013
Sequence 80485091469 from region aabc3dc0ab1c9c1ead1239b6be4234ed in table shangpin_alt_taoke_doc_queue
  Action:
    row: 00015:0000001379572773756:00001
    column: message:789
    at time: Thu Sep 19 14:39:33 CST 2013
Sequence 80485091471 from region aabc3dc0ab1c9c1ead1239b6be4234ed in table shangpin_alt_taoke_doc_queue
  Action:
    row: 00015:0000001379572773755:00001
    column: message:789
    at time: Thu Sep 19 14:39:33 CST 2013
Sequence 80485091472 from region 45ba2b931f94d4d453977ea16bccda6b in table taobao_auction_rowkey_queue
  Action:
    row: 00095:0000001379572773754:00001
    column: message:taobao_auction
    at time: Thu Sep 19 14:39:33 CST 2013
  Action:
    row: 00095:0000001379572773754:00002
    column: message:taobao_auction
    at time: Thu Sep 19 14:39:33 CST 2013
  Action:
    row: 00095:0000001379572773754:00003
    column: message:taobao_auction
    at time: Thu Sep 19 14:39:33 CST 2013
Exception in thread "main" java.lang.IndexOutOfBoundsException: index (5) must be less than
size (5)
	at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:301)
	at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:280)
	at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.get(LRUDictionary.java:122)
	at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.access$000(LRUDictionary.java:69)
	at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary.getEntry(LRUDictionary.java:40)
	at org.apache.hadoop.hbase.regionserver.wal.Compressor.readCompressed(Compressor.java:148)
	at org.apache.hadoop.hbase.regionserver.wal.HLogKey.readFields(HLogKey.java:307)
	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2199)
	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2239)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:245)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:214)
	at org.apache.hadoop.hbase.regionserver.wal.HLogPrettyPrinter.processFile(HLogPrettyPrinter.java:242)
	at org.apache.hadoop.hbase.regionserver.wal.HLogPrettyPrinter.run(HLogPrettyPrinter.java:371)
	at org.apache.hadoop.hbase.regionserver.wal.HLog.main(HLog.java:1935)
{noformat}

The exception was the same, but in the hlog detail, we could see that the {color:red}sequence
id '*80485091470*'{color} was missing. Actually, that could be the missing index '*5*' in
the exception.

So, I think there might be some bugs in HLog writing logic that made a 'gap' in the sequence
writing phase. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message