flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: ScannerTimeout over long running process
Date Thu, 27 Nov 2014 17:10:16 GMT
Could it be that there are times in the TaskManager where there are large
pauses between an inputFormat.nextRecord() and the next one..?

On Thu, Nov 27, 2014 at 3:44 PM, Stefano Bortoli <s.bortoli@gmail.com>
wrote:

> hi all,
>
> I am facing an odd issue while running a quite complex duplicates
> detection process.
>
> The code runs like a charm on a dataset of a million with few duplicates
> (3 minutes), but hits the scanner timeout over a dataset of 9.2M.
>
> The problem happens randomly, and I don't think it is related to the
> business logic, or the scan configurations for what matters.
>
> The caching block is set to 100, and the scan timeout is 900.000
> milliseconds (15min). The job would run normally in around 0.5 seconds on a
> 100 entries... therefore I must be hitting something deep. Something
> related on how Hadoop and Hbase work together.
>
> My problem is that it may fail or it may not. Yesterday I could complete
> the whole scan without problems, the the job failed over another error.
> Today, the same code failed after 3.5h, a little before completion of the
> first phase.
>
> I think it may be something about GC.
>
> I log the execution time of every single map, and everything finishes
> within milliseconds. Even then the exception happens. (as I catch it,
> print, and throw it again).
>
> Any idea of where the issue could be?
>
> thanks a lot for the support. Stack trace appended.
>
> saluti,
> Stefano
>
> Error: org.apache.hadoop.hbase.client.ScannerTimeoutException: 2387347ms
> passed since the last invocation, timeout is currently set to 900000
> at
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:352)
> at
> org.apache.flink.addons.hbase.TableInputFormat.nextRecord(TableInputFormat.java:106)
> at
> org.apache.flink.addons.hbase.TableInputFormat.nextRecord(TableInputFormat.java:48)
> at
> org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:195)
> at
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:246)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hbase.UnknownScannerException:
> org.apache.hadoop.hbase.UnknownScannerException: Name: 291, already closed?
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3043)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
> at java.lang.Thread.run(Thread.java:745)
>
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
> at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
> at
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:283)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:198)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
> at
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:336)
> ... 5 more
> Caused by:
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.UnknownScannerException):
> org.apache.hadoop.hbase.UnknownScannerException: Name: 291, already closed?
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3043)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
> at java.lang.Thread.run(Thread.java:745)
>
> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1458)
> at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1662)
> at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1720)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:29900)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:168)
>

Mime
View raw message