flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gerardg <ger...@talaia.io>
Subject Flink job hangs/deadlocks (possibly related to out of memory)
Date Fri, 29 Jun 2018 15:40:42 GMT
Hello,We have experienced some problems where a task just hangs without
showing any kind of log error while other tasks running in the same task
manager continue without problems. When these tasks are restarted the task
manager gets killed and shows several errors similar to these
ones:[Canceler/Interrupts for (...)' did not react to cancelling signal for
30 seconds, but is stuck in method:
java.nio.ByteBuffer.wrap(ByteBuffer.java:373)java.nio.ByteBuffer.wrap(ByteBuffer.java:396)org.apache.flink.core.memory.DataOutputSerializer.resize(DataOutputSerializer.java:330)org.apache.flink.core.memory.DataOutputSerializer.writeInt(DataOutputSerializer.java:212)org.apache.flink.api.common.typeutils.base.IntSerializer.serialize(IntSerializer.java:63)org.apache.flink.api.common.typeutils.base.IntSerializer.serialize(IntSerializer.java:27)org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:113)org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:32)org.apache.flink.api.scala.typeutils.TraversableSerializer$$anonfun$serialize$1.apply(TraversableSerializer.scala:98)org.apache.flink.api.scala.typeutils.TraversableSerializer$$anonfun$serialize$1.apply(TraversableSerializer.scala:93)scala.collection.immutable.List.foreach(List.scala:392)org.apache.flink.api.scala.typeutils.TraversableSerializer.serialize(TraversableSerializer.scala:93)org.apache.flink.api.scala.typeutils.TraversableSerializer.serialize(TraversableSerializer.scala:33)org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:113)org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:32)org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:177)org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:49)org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:54)org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:88)org.apache.flink.runtime.io.network.api.writer.RecordWriter.sendToTarget(RecordWriter.java:129)org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:105)org.apache.flink.streaming.runtime.io.StreamRecordWriter.emit(StreamRecordWriter.java:81)org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingBroadcastingOutputCollector.collect(OperatorChain.java:667)org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingBroadcastingOutputCollector.collect(OperatorChain.java:653)org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:679)org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:657)org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)(...)org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:50)org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.pushToOperator(OperatorChain.java:469)org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.collect(OperatorChain.java:446)org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.collect(OperatorChain.java:405)org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingBroadcastingOutputCollector.collect(OperatorChain.java:672)org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingBroadcastingOutputCollector.collect(OperatorChain.java:653)org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:679)org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:657)org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)(...)org.apache.flink.streaming.api.scala.function.util.ScalaProcessWindowFunctionWrapper.process(ScalaProcessWindowFunctionWrapper.scala:63)org.apache.flink.streaming.runtime.operators.windowing.functions.InternalIterableProcessWindowFunction.process(InternalIterableProcessWindowFunction.java:50)org.apache.flink.streaming.runtime.operators.windowing.functions.InternalIterableProcessWindowFunction.process(InternalIterableProcessWindowFunction.java:32)org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.emitWindowContents(WindowOperator.java:550)org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processElement(WindowOperator.java:403)org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:103)org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)java.lang.Thread.run(Thread.java:748)08:23:56.197
[Canceler/Interrupts for (...)' did not react to cancelling signal for 30
seconds, but is stuck in method:
org.apache.flink.core.memory.DataOutputSerializer.resize(DataOutputSerializer.java:305)org.apache.flink.core.memory.DataOutputSerializer.writeInt(DataOutputSerializer.java:212)org.apache.flink.api.common.typeutils.base.IntSerializer.serialize(IntSerializer.java:63)org.apache.flink.api.common.typeutils.base.IntSerializer.serialize(IntSerializer.java:27)org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:113)org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:32)org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:113)org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:32)org.apache.flink.api.scala.typeutils.TraversableSerializer$$anonfun$serialize$1.apply(TraversableSerializer.scala:98)org.apache.flink.api.scala.typeutils.TraversableSerializer$$anonfun$serialize$1.apply(TraversableSerializer.scala:93)scala.collection.immutable.List.foreach(List.scala:392)org.apache.flink.api.scala.typeutils.TraversableSerializer.serialize(TraversableSerializer.scala:93)org.apache.flink.api.scala.typeutils.TraversableSerializer.serialize(TraversableSerializer.scala:33)org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:113)org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:32)org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:177)org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:49)org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:54)org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:88)org.apache.flink.runtime.io.network.api.writer.RecordWriter.sendToTarget(RecordWriter.java:129)org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:105)org.apache.flink.streaming.runtime.io.StreamRecordWriter.emit(StreamRecordWriter.java:81)org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingBroadcastingOutputCollector.collect(OperatorChain.java:667)org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingBroadcastingOutputCollector.collect(OperatorChain.java:653)org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:679)org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:657)org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)(...)org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:50)org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.pushToOperator(OperatorChain.java:469)org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.collect(OperatorChain.java:446)org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.collect(OperatorChain.java:405)org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingBroadcastingOutputCollector.collect(OperatorChain.java:672)org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingBroadcastingOutputCollector.collect(OperatorChain.java:653)org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:679)org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:657)org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)(...)org.apache.flink.streaming.api.scala.function.util.ScalaProcessWindowFunctionWrapper.process(ScalaProcessWindowFunctionWrapper.scala:63)org.apache.flink.streaming.runtime.operators.windowing.functions.InternalIterableProcessWindowFunction.process(InternalIterableProcessWindowFunction.java:50)org.apache.flink.streaming.runtime.operators.windowing.functions.InternalIterableProcessWindowFunction.process(InternalIterableProcessWindowFunction.java:32)org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.emitWindowContents(WindowOperator.java:550)org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processElement(WindowOperator.java:403)org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:103)org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)java.lang.Thread.run(Thread.java:748)Our
task bundles several thousand of messages together so it creates some big
single messages which could explain why the operator hangs trying to
serialize the message. Our problem is that when a task hangs is very
difficult to detect and we have to manually cancel and restart it. Is there
any way to make the task manager fail or to increase the memory required by
the allocation?Thanks,Gerard



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Mime
View raw message