flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joshua Griffith <JGriff...@CampusLabs.com>
Subject Re: Flink Jobs disappers
Date Mon, 10 Jul 2017 16:29:20 GMT
Are your containers on separate nodes? Are you running in Kubernetes? Have you set hard resource
limits?

When I’ve run into this issue it’s been because the JobManager was restarted (I wasn’t
running in HA mode). Your node could have been restarted or Docker could have OOM-killed the
process if the machine was low on memory. You might want to `docker ps` to see if your containers
are restarting. Exit code 137 probably means that they were OOM-killed.

I wouldn’t run the JobManager on the same node as TaskManagers unless you’re using hard
resource limits. Note: if you decide to go the hard resource limit route, know that Docker
OOM-kills based on VIRT, not RSS (watch out for mmap).

> On Jul 8, 2017, at 1:54 AM, Chesnay Schepler <chesnay@apache.org> wrote:
> 
> If a TaskManager ran out of memory there should be something in the JobManager logs about
a unreachable TaskManager.
> That said, there should also be something in the JobManager logs about the job disappearing...
> 
> Could you set the logging level to DEBUG, run the job again, and provide us (or me directly)
with the logs?
> 
> Regards,
> Chesnay
> 
> On 08.07.2017 08:44, G.S.Vijay Raajaa wrote:
>> HI Chesnay,
>> 
>> 
>> I am currently using Flink - 1.3 using docker containers. I am not using it in HA
mode. I have 3 task managers and one job manager. This happens randomly and not every time.
Does it mean the task manager ran out of memory etc? I am using slots more than the available
core , I hope compute is shared in round robin. Any pointers to tuning and HA setup will be
greatly appreciated.
>> 
>> Regards,
>> Vijay Raajaa GS
>> 
>> On Sat, Jul 8, 2017 at 12:04 PM, Chesnay Schepler <chesnay@apache.org <mailto:chesnay@apache.org>>
wrote:
>> Hello,
>> 
>> could you tell us a bit more about your setup? Which Flink version you're using,
whether HA is enabled, does this happen every time etc. .
>> Regards,
>> Chesnay
>> 
>> 
>> On 06.07.2017 21:43, G.S.Vijay Raajaa wrote:
>> HI,
>> 
>> I am using Flink Task manager and Job Manager as docker containers. Strangely, I
find the jobs to disappear from the web portal after some time. The jobs don't move to the
failed state either. Any pointers will be really helpful. Not able to get a clue from the
logs.
>> 
>> Kindly let me know if I need specific tuning and ways to persists the uploaded jars.
>> 
>> Regards,
>> Vijay Raajaa G S
>> 
>> 
>> 
> 


Mime
View raw message