flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yury Ruchin <yuri.ruc...@gmail.com>
Subject Re: Running into memory issues while running on Yarn
Date Thu, 05 Jan 2017 13:24:32 GMT
Hi,

You containers got killed by YARN for exceeding virtual memory limits. For
some reason your container intensively allocate virtual memory while having
free physical memory.

There are some gotchas regarding such issue on CentOS, caused by
OS-specific aggressive virtual memory allocation: [1], [2]. They disable
YARN virtual memory checker to work around that.

Also in this mailing list people recently reported that high virtual memory
consumption may be caused by some libraries.

Links:
[1]
http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/,
section "Killing of Tasks Due to Virtual Memory Usage"
[2] https://www.mapr.com/blog/best-practices-yarn-resource-management,
section "3. Virtual/physical memory checker".

Regards,
Yury

2017-01-05 11:54 GMT+03:00 Sachin Goel <sachingoel0101@gmail.com>:

> Hey!
>
> I'm running locally under this configuration(copied from nodemanager logs):
> physical-memory=8192 virtual-memory=17204 virtual-cores=8
>
> Before starting a flink deployment, memory usage stats show 3.7 GB used on
> system, indicating lots of free memory for flink containers.
> However, after I submit using minimal resource requirements,
> ./yarn-session.sh -n 1 -tm 768, the cluster deploys successfully but then
> every application on system receives a sigterm and it basically kills the
> current user session, logging out of the system.
>
> The job manager and task manager logs contain just the information that a
> SIGTERM was received and shut down gracefully.
> All yarn and dfs process contain the log information showing the receipt
> of a sigterm.
>
> Here's the relevant log from nodemanager:
>
> 2017-01-05 17:00:06,089 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
Container container_1483603191971_0002_01_000002 transitioned from LOCALIZED to RUNNING
> 2017-01-05 17:00:06,092 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
launchContainer: [bash, /opt/hadoop-2.7.3/tmp/nm-local-dir/usercache/kirk/appcache/application_1483603191971_0002/container_1483603191971_0002_01_000002/default_container_executor.sh]
> 2017-01-05 17:00:08,731 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Starting resource-monitoring for container_1483603191971_0002_01_000002
> 2017-01-05 17:00:08,744 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 17872 for container-id container_1483603191971_0002_01_000001:
282.7 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual memory used
> 2017-01-05 17:00:08,744 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Process tree for container: container_1483603191971_0002_01_000001 has processes older than
1 iteration running over the configured limit. Limit=2254857728, current usage = 2255896576
> 2017-01-05 17:00:08,745 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Container [pid=17872,containerID=container_1483603191971_0002_01_000001] is running beyond
virtual memory limits. Current usage: 282.7 MB of 1 GB physical memory used; 2.1 GB of 2.1
GB virtual memory used. Killing container.
> Dump of the process-tree for container_1483603191971_0002_01_000001 :
> 	|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES)
RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> 	|- 17872 17870 17872 17872 (bash) 0 0 21409792 812 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64//bin/java
-Xmx424M  -Dlog.file=/opt/hadoop-2.7.3/logs/userlogs/application_1483603191971_0002/container_1483603191971_0002_01_000001/jobmanager.log
-Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner
 1>/opt/hadoop-2.7.3/logs/userlogs/application_1483603191971_0002/container_1483603191971_0002_01_000001/jobmanager.out
2>/opt/hadoop-2.7.3/logs/userlogs/application_1483603191971_0002/container_1483603191971_0002_01_000001/jobmanager.err
> 	|- 17879 17872 17872 17872 (java) 748 20 2234486784 71553 /usr/lib/jvm/java-8-openjdk-amd64//bin/java
-Xmx424M -Dlog.file=/opt/hadoop-2.7.3/logs/userlogs/application_1483603191971_0002/container_1483603191971_0002_01_000001/jobmanager.log
-Dlogback.configurationFile=file:logback.xml -Dlog4j.configuration=file:log4j.properties org.apache.flink.yarn.YarnApplicationMasterRunner
>
> 2017-01-05 17:00:08,745 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Removed ProcessTree with root 17872
> 2017-01-05 17:00:08,746 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
Container container_1483603191971_0002_01_000001 transitioned from RUNNING to KILLING
> 2017-01-05 17:00:08,746 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_1483603191971_0002_01_000001
> 2017-01-05 17:00:08,779 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager:
RECEIVED SIGNAL 15: SIGTERM
> 2017-01-05 17:00:08,822 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Exit code from container container_1483603191971_0002_01_000001 is : 143
> 2017-01-05 17:00:08,825 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Exit code from container container_1483603191971_0002_01_000002 is : 143
>
>
> Is the memory available on my pc not enough or are there any known issues
> which might lead to this?
>
> Also, this doesn't occur every time I start a flink session.
>
> Thanks
> Sachin
>

Mime
View raw message