flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Malte Schwarzer <impres...@mieo.de>
Subject TaskManager randomly dies
Date Fri, 27 Jan 2017 15:13:28 GMT
Hi all,

when running a Flink batch job, from time to time a TaskManager dies
randomly, which makes the full job failing. All other nodes then throw
the following exception:

Error obtaining the sorted input: Thread 'SortMerger Reading Thread'
terminated due to an exception: Connection unexpectedly closed by remote
task manager 'dyingnode' ...

However, there are no error messages in the log of 'dyingnode'.

But in the PID thread dump of 'dyingnode' I found this:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0x00003fff701afa4c, pid=1119228,
tid=0x00003ff38a3ff1b0
#
# JRE version: OpenJDK Runtime Environment (8.0_101-b14) (build
1.8.0_101-b14)
# Java VM: OpenJDK 64-Bit Server VM (25.101-b14 mixed mode linux-ppc64 )
# Problematic frame:
# J 433 C2 org.apache.flink.runtime.util.DataOutputSerializer.write(I)V
(40 bytes) @ 0x00003fff701afa4c [0x00003fff701afa00+0x4c]
# ...

What can cause this? And is this Flink related?


Best regards,
Malte

Mime
View raw message