hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tsuyoshi OZAWA <ozawa.tsuyo...@gmail.com>
Subject Re: virtual memory consumption
Date Thu, 11 Sep 2014 11:29:40 GMT
Hi Jakub,

You have 2 options:
1. Turning off virtual memory check as you mentioned.
2. Making yarn.nodemanager.vmem-pmem-ratio larger.

1. is reasonable choice if you cannot predict virtual memory usage in
advance or you don't have any applications to check virtual memory.

Thanks,
- Tsuyoshi



On Thu, Sep 11, 2014 at 7:24 PM, Jakub Stransky <stransky.ja@gmail.com>
wrote:

> Hi,
>
> thanks for reply. Machine is pretty small as it has 4GB of total memory.
> So we reserved 1GB for OS, 1GB HBase (according to recommendation) so
> remains 2GB thats what nodemanager claims.
>
> Actually it is a cluster of 5machines, 2 name-nodes and 3 data nodes. All
> machines has similar parameters so the stronger ones are used for nn and
> rest for dn. I know that hw is far away from ideal but it is a small
> cluster for a POC and gaining some experiences.
>
> Back to the problem. At the time when this happens no other job is running
> on cluster. All mappers (3) has already finished and we have single reduce
> task which fails at ~ 70% of its progress on virtual memory consumption.
> Dataset which is processing is 500MB of avro data file compressed. Reducer
> doesn't cache anything intentionally, just divide a records in various
> folders dynamically.
> From RM console I clearly see that there is a free unused resources -
> memory. Is there a way how to detect what consumed that assigned virtual
> memory?  Because for a smaller amount of input data ~ 120MB compressed data
> - job finishes just fine within 3 min.
>
> We have obviously a problem in scaling the task out. Could someone provide
> some hints as it seems that we are missing something fundamental here.
>
> Thanks for helping me out
> Jakub
>
> On 11 September 2014 11:34, Susheel Kumar Gadalay <skgadalay@gmail.com>
> wrote:
>
>> Your physical memory is 1GB on this node.
>>
>> What are the other containers (map tasks) running on this?
>>
>> You have given map memory as 768M and reduce memory as 1024M and am as
>> 1024M.
>>
>> With AM and a single map task it is 1.7M and cannot start another
>> container for reducer.
>> Reduce these values and check.
>>
>> On 9/11/14, Jakub Stransky <stransky.ja@gmail.com> wrote:
>> > Hello hadoop users,
>> >
>> > I am facing following issue when running M/R job during a reduce phase:
>> >
>> > Container
>> [pid=22961,containerID=container_1409834588043_0080_01_000010] is
>> > running beyond virtual memory limits. Current usage: 636.6 MB of 1 GB
>> > physical memory used; 2.1 GB of 2.1 GB virtual memory used.
>> > Killing container. Dump of the process-tree for
>> > container_1409834588043_0080_01_000010 :
>> > |- PID    PPID  PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
>> > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>> > |- 22961  16896 22961  22961  (bash)    0                      0
>> >         9424896           312                 /bin/bash -c
>> > /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true
>> > -Dhadoop.metrics.log.level=WARN -Xmx768m
>> >
>> -Djava.io.tmpdir=/home/hadoop/yarn/local/usercache/jobsubmit/appcache/application_1409834588043_0080/container_1409834588043_0080_01_000010/tmp
>> > -Dlog4j.configuration=container-log4j.properties
>> >
>> -Dyarn.app.container.log.dir=/home/hadoop/yarn/logs/application_1409834588043_0080/container_1409834588043_0080_01_000010
>> > -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
>> > org.apache.hadoop.mapred.YarnChild 153.87.47.116 47184
>> > attempt_1409834588043_0080_r_000000_0 10
>> >
>> 1>/home/hadoop/yarn/logs/application_1409834588043_0080/container_1409834588043_0080_01_000010/stdout
>> >
>> 2>/home/hadoop/yarn/logs/application_1409834588043_0080/container_1409834588043_0080_01_000010/stderr
>> > |- 22970 22961 22961 22961 (java) 24692 1165 2256662528 162659
>> > /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true
>> > -Dhadoop.metrics.log.level=WARN -Xmx768m
>> >
>> -Djava.io.tmpdir=/home/hadoop/yarn/local/usercache/jobsubmit/appcache/application_1409834588043_0080/container_1409834588043_0080_01_000010/tmp
>> > -Dlog4j.configuration=container-log4j.properties
>> >
>> -Dyarn.app.container.log.dir=/home/hadoop/yarn/logs/application_1409834588043_0080/container_1409834588043_0080_01_000010
>> > -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
>> > org.apache.hadoop.mapred.YarnChild 153.87.47.116 47184
>> > attempt_1409834588043_0080_r_000000_0 10 Container killed on request.
>> Exit
>> > code is 143
>> >
>> >
>> > I have following settings with default ratio physical to vm set to 2.1 :
>> > # hadoop - yarn-site.xml
>> > yarn.nodemanager.resource.memory-mb  : 2048
>> > yarn.scheduler.minimum-allocation-mb : 256
>> > yarn.scheduler.maximum-allocation-mb : 2048
>> >
>> > # hadoop - mapred-site.xml
>> > mapreduce.map.memory.mb              : 768
>> > mapreduce.map.java.opts              : -Xmx512m
>> > mapreduce.reduce.memory.mb           : 1024
>> > mapreduce.reduce.java.opts           : -Xmx768m
>> > mapreduce.task.io.sort.mb            : 100
>> > yarn.app.mapreduce.am.resource.mb    : 1024
>> > yarn.app.mapreduce.am.command-opts   : -Xmx768m
>> >
>> > I have following questions:
>> > - Is it possible to track down the vm consumption? Find what was the
>> cause
>> > for such a high vm.
>> > - What is the best way to solve this kind of problems?
>> > - I found following recommendation on the internet: " We actually
>> recommend
>> > disabling this check by setting yarn.nodemanager.vmem-check-enabled to
>> false
>> > as
>> > there is reason to believe the virtual/physical ratio is exceptionally
>> high
>> > with some versions of Java / Linux." Is it a good way to go?
>> >
>> > My reduce task doesn't perform any super activity - just classify data,
>> for
>> > a given input key chooses the appropriate output folder and writes the
>> data
>> > out.
>> >
>> > Thanks for any advice
>> > Jakub
>> >
>>
>
>
>
> --
> Jakub Stransky
> cz.linkedin.com/in/jakubstransky
>
>


-- 
- Tsuyoshi

Mime
View raw message