flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: High virtual memory usage
Date Fri, 16 Dec 2016 18:06:25 GMT
Also, can you tell us what OS you are running on?

On Fri, Dec 16, 2016 at 6:23 PM, Stephan Ewen <sewen@apache.org> wrote:

> Hi!
>
> To diagnose this a little better, can you help us with the following info:
>
>   - Are you using RocksDB?
>   - What is your flink configuration, especially around memory settings?
>   - What do you use for TaskManager heap size? Any manual value, or do you
> let Flink/Yarn set it automatically based on container size?
>   - Do you use any libraries or connectors in your program?
>
> Greetings,
> Stephan
>
>
> On Fri, Dec 16, 2016 at 5:47 PM, Paulo Cezar <paulo.cezar@gogeo.io> wrote:
>
>> Hi Folks,
>>
>> I'm running Flink (1.2-SNAPSHOT nightly) on YARN (Hadoop 2.7.2). A few
>> hours after I start a streaming job (built using kafka connect 0.10_2.11)
>> it gets killed seemingly for no reason. After inspecting the logs my best
>> guess is that YARN is killing containers due to high virtual memory usage.
>>
>> Any guesses on why this might be happening or tips of what I should be
>> looking for?
>>
>> What I'll do next is enable taskmanager.debug.memory.startLogThread to
>> keep investigating. Also, I was deploying flink-1.2-SNAPSHOT-b
>> in-hadoop2.tgz
>> <https://s3.amazonaws.com/flink-nightly/flink-1.2-SNAPSHOT-bin-hadoop2.tgz>
>> on YARN, but my job uses scala 2.11 dependencies so I'll try using
>> flink-1.2-SNAPSHOT-bin-hadoop2_2.11.tgz
>> <https://s3.amazonaws.com/flink-nightly/flink-1.2-SNAPSHOT-bin-hadoop2_2.11.tgz>
>> instead.
>>
>>
>>    - Flink logs:
>>
>> 2016-12-15 17:44:03,763 WARN  akka.remote.ReliableDeliverySupervisor            
           - Association with remote system [akka.tcp://flink@10.0.0.8:49832] has failed,
address is now gated for [5000] ms. Reason is: [Disassociated].
>> 2016-12-15 17:44:05,475 INFO  org.apache.flink.yarn.YarnFlinkResourceManager    
           - Container ResourceID{resourceId='container_1481732559439_0002_01_000004'} failed.
Exit status: 1
>> 2016-12-15 17:44:05,476 INFO  org.apache.flink.yarn.YarnFlinkResourceManager    
           - Diagnostics for container ResourceID{resourceId='container_1481732559439_0002_01_000004'}
in state COMPLETE : exitStatus=1 diagnostics=Exception from container-launch.
>> Container id: container_1481732559439_0002_01_000004
>> Exit code: 1
>> Stack trace: ExitCodeException exitCode=1:
>> 	at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
>> 	at org.apache.hadoop.util.Shell.run(Shell.java:456)
>> 	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
>> 	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
>> 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>> 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> 	at java.lang.Thread.run(Thread.java:745)
>>
>>
>> Container exited with a non-zero exit code 1
>>
>>
>>
>>    - YARN logs:
>>
>> container_1481732559439_0002_01_000004: 2.6 GB of 5 GB physical memory used; 38.1
GB of 10.5 GB virtual memory used
>> 2016-12-15 17:44:03,119 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 62223 for container-id container_1481732559439_0002_01_000001:
656.3 MB of 2 GB physical memory used; 3.2 GB of 4.2 GB virtual memory used
>> 2016-12-15 17:44:03,766 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Exit code from container container_1481732559439_0002_01_000004 is : 1
>> 2016-12-15 17:44:03,766 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Exception from container-launch with container ID: container_1481732559439_0002_01_000004
and exit code: 1
>> ExitCodeException exitCode=1:
>> 	at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
>> 	at org.apache.hadoop.util.Shell.run(Shell.java:456)
>> 	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
>> 	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
>> 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>> 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> 	at java.lang.Thread.run(Thread.java:745)
>>
>>
>> Best regards,
>> Paulo Cezar
>>
>
>

Mime
View raw message