hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lukavsk√Ĺ <jan.lukav...@firma.seznam.cz>
Subject ScannerTimeoutException during MapReduce
Date Thu, 11 Aug 2011 13:58:45 GMT

we've recently moved to HBase 0.90.3 (cdh3u1) from 0.20.6, which 
resolved most of our previous issues, but we are now having much more 
ScannerTimeoutExceptions than before. All these exceptions come from 
trace like this

org.apache.hadoop.hbase.client.ScannerTimeoutException: 307127ms passed since the last invocation,
timeout is currently set to 60000
	at org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1133)
	at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:143)
	at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:142)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:456)
	at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)

After a bit of investigation, I suppose that cause of this is that the 
first call to scanner.next() after HTable.getScanner() times out. What 
could be the cause of this? I see neither any region moving around in 
the cluster nor any compation on the side of the regionserver. As long 
as I can tell everything looks just fine. This would suggest, that it 
took too long to locate the regionserver in call to HTable.getScanner(), 
but I cannot see any reason. Could this issue be resolved on the side of 
TableRecordReader? Eg. at TableRecordReaderImpl.java:143 the 
ScannerTimeoutException could be caught and the scanner restarted a 
couple more times (say configurable?).

After looking at the code it also seems to me, that there may be a bug 
causing the reader to skip the first row of region. The scenario is as 
  - the reader is initialized with TableRecordReader.init()
  - then nextKeyValue is called, causing call to scanner.next() - here 
ScannerTimeoutException occurs
  - the scanner is restarted by call to restart() and then *two* calls 
to scanner.next() occur, causing we have lost the first row

Can anyone confirm this?


View raw message