samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Shivanna <>
Subject [DISCUSS] SEP-3: Heart-beat mechanism between JobCoordinator and all running containers
Date Tue, 25 Apr 2017 01:42:05 GMT
Hi Everyone,

In order to fix the issue of orphaned/leaky containers seen when the
YARN Node Manager crashes, I have created a SEP discussing the design for
implementing a heartbeat between the containers and the job coordinator:

Please take a look and provide feedback. I would also really appreciate
help in designing a way to propagate the error up from SamzaContainer in
order to exit the container with a non-zero exit code.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message