hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Victor Xu (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HBASE-9646) SplitLogWorker throws IndexOutOfBoundsException while reading compressed hlog
Date Sun, 06 Oct 2013 00:36:41 GMT

     [ https://issues.apache.org/jira/browse/HBASE-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Victor Xu resolved HBASE-9646.
------------------------------

      Resolution: Duplicate
    Release Note: 
Same as HBASE-9645.
All caused by the HRegion's updatesLock. 

> SplitLogWorker throws IndexOutOfBoundsException while reading compressed hlog
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-9646
>                 URL: https://issues.apache.org/jira/browse/HBASE-9646
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 0.94.10
>         Environment: Linux 2.6.32-el5.x86_64
>            Reporter: Victor Xu
>
> This case just happened after a regionserver halt.([HBASE-9645|https://issues.apache.org/jira/browse/HBASE-9645])
> As we know, when a RS was down, the master would direct other live RS to split the hlog
remains of the dead RS. Then, after the whole splitting, it reassign the regions to other
live RS. 
> The case I described here is that one of the hlog file of the dead RS cannot be splitted.
This hlog splitting task had been resubmitted by master several times to different RS, but
all of them failed.
> I checked one of the RS's regionserver log, it had some read error:
> {noformat}
> 2013-09-19 15:54:53,082 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker
hadoop250.cm6,60020,1378886880990 acquired task /hbase/splitlog/hdfs%3A%2F%2Fhadoopnnvip.cm6%3A9000%2Fhbase%2F.logs
> %2Fhadoop230.cm6%2C60020%2C1378886880970-splitting%2Fhadoop230.cm6%252C60020%252C1378886880970.1379572773730
> 2013-09-19 15:54:53,083 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting
hlog: hdfs://hadoopnnvip.cm6:9000/hbase/.logs/hadoop230.cm6,60020,1378886880970-splitting/hadoop230.cm6%2C6002
> 0%2C1378886880970.1379572773730, length=1169560315
> 2013-09-19 15:54:53,083 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: Recovering file
hdfs://hadoopnnvip.cm6:9000/hbase/.logs/hadoop230.cm6,60020,1378886880970-splitting/hadoop230.cm6%2C60020%2C137888688
> 0970.1379572773730
> 2013-09-19 15:54:53,084 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: Finished lease
recovery attempt for hdfs://hadoopnnvip.cm6:9000/hbase/.logs/hadoop230.cm6,60020,1378886880970-splitting/hadoop230.cm6
> %2C60020%2C1378886880970.1379572773730
> 2013-09-19 15:54:53,166 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found
existing old edits file. It could be the result of a previous failed split attempt. Deleting
hdfs://hadoopnnvip.cm
> 6:9000/hbase/page_content_queue/1af5b36fd3695e18ef85ea6d768a3e45/recovered.edits/0000000080485091467,
length=71240
> 2013-09-19 15:54:53,177 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found
existing old edits file. It could be the result of a previous failed split attempt. Deleting
hdfs://hadoopnnvip.cm
> 6:9000/hbase/offline_dashboard_queue/2514b6f61f05a2616f1e3dd893a56da6/recovered.edits/0000000080485091468,
length=1242
> 2013-09-19 15:54:53,188 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found
existing old edits file. It could be the result of a previous failed split attempt. Deleting
hdfs://hadoopnnvip.cm
> 6:9000/hbase/taobao_auction_rowkey_queue/45ba2b931f94d4d453977ea16bccda6b/recovered.edits/0000000080485091472,
length=704
> 2013-09-19 15:54:53,204 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found
existing old edits file. It could be the result of a previous failed split attempt. Deleting
hdfs://hadoopnnvip.cm
> 6:9000/hbase/shangpin_alt_taoke_doc_queue/aabc3dc0ab1c9c1ead1239b6be4234ed/recovered.edits/0000000080485091471,
length=8998
> 2013-09-19 15:54:53,216 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found
existing old edits file. It could be the result of a previous failed split attempt. Deleting
hdfs://hadoopnnvip.cm
> 6:9000/hbase/link_rowkey/eb004b12a0e1d3f467cadb19a53ca29e/recovered.edits/0000000080485091462,
length=612
> 2013-09-19 15:54:53,219 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Processed
13 edits across 5 regions threw away edits for 0 regions; log file=hdfs://hadoopnnvip.cm6:9000/hbase/.logs/hadoop230.cm6,60020,1378886880970-splitting/hadoop230.cm6%2C60020%2C1378886880970.1379572773730
is corrupted = false progress failed = false
> 2013-09-19 15:54:53,219 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker
hadoop250.cm6,60020,1378886880990 done with task /hbase/splitlog/hdfs%3A%2F%2Fhadoopnnvip.cm6%3A9000%2Fhbase%2F.logs%2Fhadoop230.cm6%2C60020%2C1378886880970-splitting%2Fhadoop230.cm6%252C60020%252C1378886880970.1379572773730
in 137ms
> 2013-09-19 15:54:53,219 ERROR org.apache.hadoop.hbase.regionserver.SplitLogWorker: unexpected
error 
> java.lang.IndexOutOfBoundsException: index (5) must be less than size (5)
>         at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:301)
>         at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:280)
>         at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.get(LRUDictionary.java:122)
>         at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.access$000(LRUDictionary.java:69)
>         at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary.getEntry(LRUDictionary.java:40)
>         at org.apache.hadoop.hbase.regionserver.wal.Compressor.readCompressed(Compressor.java:111)
>         at org.apache.hadoop.hbase.regionserver.wal.HLogKey.readFields(HLogKey.java:307)
>         at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2199)
>         at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2239)
>         at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:245)
>         at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:214)
>         at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:813)
>         at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:437)
>         at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:381)
>         at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:112)
>         at org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
>         at org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:211)
>         at org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:179)
>         at java.lang.Thread.run(Thread.java:662)
> 2013-09-19 15:54:53,219 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker
hadoop250.cm6,60020,1378886880990 exiting
> {noformat}
> I then used the HLog dump tool to analyze the hlog file. Here is what I got:
> {noformat}
> $ hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump /user/hadoop/hbase_admin/logs/hadoop230.cm6%2C60020%2C1378886880970.1379572773730
> 13/09/24 14:24:38 WARN conf.Configuration: fs.default.name is deprecated. Instead, use
fs.defaultFS
> Sequence 80485091459 from region aabc3dc0ab1c9c1ead1239b6be4234ed in table shangpin_alt_taoke_doc_queue
>   Action:
>     row: 00015:0000001379572773802:00001
>     column: message:789
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091460 from region aabc3dc0ab1c9c1ead1239b6be4234ed in table shangpin_alt_taoke_doc_queue
>   Action:
>     row: 00015:0000001379572773799:00001
>     column: message:789
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091461 from region eb004b12a0e1d3f467cadb19a53ca29e in table link_rowkey
>   Action:
>     row: 191:1379572773794
>     column: link_crawler:15acca08d58e374208eb603aaa4c58e8_1379572773791_345445
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091462 from region eb004b12a0e1d3f467cadb19a53ca29e in table link_rowkey
>   Action:
>     row: 191:1379572773793
>     column: link_crawler:f8a52947d7a0e794204f118fc7c5fb5a_1379572773791_916104
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091463 from region 1af5b36fd3695e18ef85ea6d768a3e45 in table page_content_queue
>   Action:
>     row: 00039:0000001379572773776:00001
>     column: message:content
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091464 from region 2514b6f61f05a2616f1e3dd893a56da6 in table offline_dashboard_queue
>   Action:
>     row: 00082:0000001379572773772:00001
>     column: message:parser
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091465 from region 2514b6f61f05a2616f1e3dd893a56da6 in table offline_dashboard_queue
>   Action:
>     row: 00082:0000001379572773770:00001
>     column: message:parser
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091466 from region 45ba2b931f94d4d453977ea16bccda6b in table taobao_auction_rowkey_queue
>   Action:
>     row: 00095:0000001379572773768:00001
>     column: message:taobao_auction
>     at time: Thu Sep 19 14:39:33 CST 2013
>   Action:
>     row: 00095:0000001379572773768:00002
>     column: message:taobao_auction
>     at time: Thu Sep 19 14:39:33 CST 2013
>   Action:
>     row: 00095:0000001379572773768:00003
>     column: message:taobao_auction
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091467 from region 1af5b36fd3695e18ef85ea6d768a3e45 in table page_content_queue
>   Action:
>     row: 00039:0000001379572773766:00001
>     column: message:content
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091468 from region 2514b6f61f05a2616f1e3dd893a56da6 in table offline_dashboard_queue
>   Action:
>     row: 00082:0000001379572773759:00001
>     column: message:parser
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091469 from region aabc3dc0ab1c9c1ead1239b6be4234ed in table shangpin_alt_taoke_doc_queue
>   Action:
>     row: 00015:0000001379572773756:00001
>     column: message:789
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091471 from region aabc3dc0ab1c9c1ead1239b6be4234ed in table shangpin_alt_taoke_doc_queue
>   Action:
>     row: 00015:0000001379572773755:00001
>     column: message:789
>     at time: Thu Sep 19 14:39:33 CST 2013
> Sequence 80485091472 from region 45ba2b931f94d4d453977ea16bccda6b in table taobao_auction_rowkey_queue
>   Action:
>     row: 00095:0000001379572773754:00001
>     column: message:taobao_auction
>     at time: Thu Sep 19 14:39:33 CST 2013
>   Action:
>     row: 00095:0000001379572773754:00002
>     column: message:taobao_auction
>     at time: Thu Sep 19 14:39:33 CST 2013
>   Action:
>     row: 00095:0000001379572773754:00003
>     column: message:taobao_auction
>     at time: Thu Sep 19 14:39:33 CST 2013
> Exception in thread "main" java.lang.IndexOutOfBoundsException: index (5) must be less
than size (5)
> 	at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:301)
> 	at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:280)
> 	at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.get(LRUDictionary.java:122)
> 	at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.access$000(LRUDictionary.java:69)
> 	at org.apache.hadoop.hbase.regionserver.wal.LRUDictionary.getEntry(LRUDictionary.java:40)
> 	at org.apache.hadoop.hbase.regionserver.wal.Compressor.readCompressed(Compressor.java:148)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogKey.readFields(HLogKey.java:307)
> 	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2199)
> 	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2239)
> 	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:245)
> 	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:214)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogPrettyPrinter.processFile(HLogPrettyPrinter.java:242)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogPrettyPrinter.run(HLogPrettyPrinter.java:371)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLog.main(HLog.java:1935)
> {noformat}
> The exception was the same, but in the hlog detail, we could see that the {color:red}sequence
id '*80485091470*'{color} was missing. Actually, that could be the missing index '*5*' in
the exception.
> So, I think there might be some bugs in HLog writing logic that made a 'gap' in the sequence
writing phase. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message