flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mclendenin <marcusc...@gmail.com>
Subject Re: RocksDB error with flink 1.2.0
Date Fri, 28 Apr 2017 20:00:56 GMT
This is the stacktrace I'm getting when checkpointing to the HDFS. It happens
like once every 3 checkpoints and I don't see this without parallelism. 

AsynchronousException{java.lang.Exception: Could not materialize checkpoint
6 for operator KeyedCEPPatternOperator -> Flat Map -> Map -> Sink: Unnamed
(4/6).}
	at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:980)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.Exception: Could not materialize checkpoint 6 for
operator KeyedCEPPatternOperator -> Flat Map -> Map -> Sink: Unnamed (4/6).
	... 6 more
Caused by: java.util.concurrent.ExecutionException: java.io.IOException:
Could not flush and close the file system output stream to
hdfs:///user/hadoop/flink/checkpoints/dc2aee563bebce76e420029525c37892/chk-6/6312cc6f-a60f-4458-8d0b-0455d69d3048
in order to obtain the stream state handle
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
	at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915)
	... 5 more
	Suppressed: java.lang.Exception: Could not properly cancel managed keyed
state future.
		at
org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:91)
		at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1010)
		at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:974)
		... 5 more
	Caused by: java.util.concurrent.ExecutionException: java.io.IOException:
Could not flush and close the file system output stream to
hdfs:///user/hadoop/flink/checkpoints/dc2aee563bebce76e420029525c37892/chk-6/6312cc6f-a60f-4458-8d0b-0455d69d3048
in order to obtain the stream state handle
		at java.util.concurrent.FutureTask.report(FutureTask.java:122)
		at java.util.concurrent.FutureTask.get(FutureTask.java:192)
		at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
		at
org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:79)
		at
org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:89)
		... 7 more
	Caused by: java.io.IOException: Could not flush and close the file system
output stream to
hdfs:///user/hadoop/flink/checkpoints/dc2aee563bebce76e420029525c37892/chk-6/6312cc6f-a60f-4458-8d0b-0455d69d3048
in order to obtain the stream state handle
		at
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:333)
		at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBSnapshotOperation.closeSnapshotStreamAndGetHandle(RocksDBKeyedStateBackend.java:580)
		at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBSnapshotOperation.closeCheckpointStream(RocksDBKeyedStateBackend.java:410)
		at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.performOperation(RocksDBKeyedStateBackend.java:298)
		at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.performOperation(RocksDBKeyedStateBackend.java:277)
		at
org.apache.flink.runtime.io.async.AbstractAsyncIOCallable.call(AbstractAsyncIOCallable.java:72)
		at java.util.concurrent.FutureTask.run(FutureTask.java:266)
		at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:37)
		at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:915)
		... 5 more
	Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/user/hadoop/flink/checkpoints/dc2aee563bebce76e420029525c37892/chk-6/6312cc6f-a60f-4458-8d0b-0455d69d3048
could only be replicated to 0 nodes instead of minReplication (=1).  There
are 3 datanode(s) running and no node(s) are excluded in this operation.
	at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
	at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
	at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
	at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
	at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
	at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

		at org.apache.hadoop.ipc.Client.call(Client.java:1475)
		at org.apache.hadoop.ipc.Client.call(Client.java:1412)
		at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
		at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
		at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
		at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
		at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
		at java.lang.reflect.Method.invoke(Method.java:498)
		at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
		at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
		at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
		at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
		at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
		at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
	[CIRCULAR REFERENCE:java.io.IOException: Could not flush and close the file
system output stream to
hdfs:///user/hadoop/flink/checkpoints/dc2aee563bebce76e420029525c37892/chk-6/6312cc6f-a60f-4458-8d0b-0455d69d3048
in order to obtain the stream state handle]



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/RocksDB-error-with-flink-1-2-0-tp12897p12907.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Mime
View raw message