flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gyula Fora (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-4193) Task manager JVM crashes while deploying cancelling jobs
Date Mon, 11 Jul 2016 14:12:10 GMT
Gyula Fora created FLINK-4193:
---------------------------------

             Summary: Task manager JVM crashes while deploying cancelling jobs
                 Key: FLINK-4193
                 URL: https://issues.apache.org/jira/browse/FLINK-4193
             Project: Flink
          Issue Type: Bug
          Components: Streaming, TaskManager
            Reporter: Gyula Fora
            Priority: Critical


We have observed several TM crashes while deploying larger stateful streaming jobs that use
the RocksDB state backend.

As the JVMs crash the logs don't show anything but I have uploaded all the info I have got
from the standard output.

This indicates some GC and possibly some RocksDB issues underneath but we could not really
figure out much more.

GC segfault
https://gist.github.com/gyfora/9e56d4a0d4fc285a8d838e1b281ae125

Other crashes (maybe rocks related)
https://gist.github.com/gyfora/525c67c747873f0ff2ff2ed1682efefa
https://gist.github.com/gyfora/b93611fde87b1f2516eeaf6bfbe8d818

The third link shows 2 issues that happened in parallel...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message