hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lukavský <jan.lukav...@firma.seznam.cz>
Subject Re: ScannerTimeoutException during MapReduce
Date Fri, 12 Aug 2011 09:55:28 GMT
Hi J-D,

thanks for reply.

On 11.8.2011 23:37, Jean-Daniel Cryans wrote:
> Usual reasons would be like a mix of taking a long time to process
> rows in the mapper and scanners that grab a lot of rows (using scanner
> caching and maybe filters).
>
> Do you enable DEBUG for HBase in your mapreduce context? This would
> give relevant information like if the client was doing lots of retries
> trying to start scanning.

I'll enable it and try to get more info. Thanks.

> About the bug, can you open a jira?
Sure.
https://issues.apache.org/jira/browse/HBASE-4196

> J-D
>
> On Thu, Aug 11, 2011 at 6:58 AM, Jan Lukavský
> <jan.lukavsky@firma.seznam.cz>  wrote:
>> Hi,
>>
>> we've recently moved to HBase 0.90.3 (cdh3u1) from 0.20.6, which resolved
>> most of our previous issues, but we are now having much more
>> ScannerTimeoutExceptions than before. All these exceptions come from trace
>> like this
>>
>> org.apache.hadoop.hbase.client.ScannerTimeoutException: 307127ms passed
>> since the last invocation, timeout is currently set to 60000
>>         at
>> org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1133)
>>         at
>> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:143)
>>         at
>> org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:142)
>>         at
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:456)
>>         at
>> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>>
>>
>> After a bit of investigation, I suppose that cause of this is that the first
>> call to scanner.next() after HTable.getScanner() times out. What could be
>> the cause of this? I see neither any region moving around in the cluster nor
>> any compation on the side of the regionserver. As long as I can tell
>> everything looks just fine. This would suggest, that it took too long to
>> locate the regionserver in call to HTable.getScanner(), but I cannot see any
>> reason. Could this issue be resolved on the side of TableRecordReader? Eg.
>> at TableRecordReaderImpl.java:143 the ScannerTimeoutException could be
>> caught and the scanner restarted a couple more times (say configurable?).
>>
>> After looking at the code it also seems to me, that there may be a bug
>> causing the reader to skip the first row of region. The scenario is as
>> follows:
>>   - the reader is initialized with TableRecordReader.init()
>>   - then nextKeyValue is called, causing call to scanner.next() - here
>> ScannerTimeoutException occurs
>>   - the scanner is restarted by call to restart() and then *two* calls to
>> scanner.next() occur, causing we have lost the first row
>>
>> Can anyone confirm this?
>>
>> Thanks,
>>   Jan
>>
>>


-- 

Jan Lukavský
programátor
Seznam.cz, a.s.
Radlická 608/2
15000, Praha 5

jan.lukavsky@firma.seznam.cz
http://www.seznam.cz


Mime
View raw message