hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jakub Stransky <stransky...@gmail.com>
Subject Re: virtual memory consumption
Date Thu, 11 Sep 2014 13:35:08 GMT
Hi Tsuyoshi,

thanks for your summary. The choice 1 is obvious as a kind of safety net to
get the job through but might put in danger. But what caught my attention
was "if you cannot predict virtual memory usage in advance or you don't
have any applications to check virtual memory"

So my question follows:
1) how can I predict vm usage ?
2) what do you mean by application to check virtual memory?
3) I suppose that both parameters for option 1 and 2 are cluster wide or is
it possible to set vmem-pmem-ratio on per job basis?
4) finally a general one coming out of previous - Does the hadoop offers
some clue on how to recognize which parameter is configurable on per
cluster, node or job bases? Or is it somewhere documented? Sometime it is
clear but sometimes I am just guessing.

I have attached a vm dump for following failed task:
Container [pid=32193,containerID=container_1409834588043_0198_01_000010] is
running beyond virtual memory limits. Current usage: 648.5 MB of 1 GB
physical memory used; 2.1 GB of 2.1 GB virtual memory used. Killing
container. Dump of the process-tree for
container_1409834588043_0198_01_000010 : |- PID PPID PGRPID SESSID CMD_NAME
USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES)
RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 32201 32193 32193 32193 (java) 28628
1409 2252427264 165717 /usr/java/default/bin/java
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx768m
-Djava.io.tmpdir=/home/hadoop/yarn/local/usercache/jobsubmit/appcache/application_1409834588043_0198/container_1409834588043_0198_01_000010/tmp
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=/home/hadoop/yarn/logs/application_1409834588043_0198/container_1409834588043_0198_01_000010
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
org.apache.hadoop.mapred.YarnChild 153.87.47.111 52590
attempt_1409834588043_0198_r_000000_0 10 |- 32193 21791 32193 32193 (bash)
0 0 9424896 307 /bin/bash -c /usr/java/default/bin/java
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx768m
-Djava.io.tmpdir=/home/hadoop/yarn/local/usercache/jobsubmit/appcache/application_1409834588043_0198/container_1409834588043_0198_01_000010/tmp
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=/home/hadoop/yarn/logs/application_1409834588043_0198/container_1409834588043_0198_01_000010
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
org.apache.hadoop.mapred.YarnChild 153.87.47.111 52590
attempt_1409834588043_0198_r_000000_0 10
1>/home/hadoop/yarn/logs/application_1409834588043_0198/container_1409834588043_0198_01_000010/stdout
2>/home/hadoop/yarn/logs/application_1409834588043_0198/container_1409834588043_0198_01_000010/stderr
Container killed on request. Exit code is 143

I have similar dumps at various stages of the task.

Thanks for helping me out
Jakub

On 11 September 2014 13:29, Tsuyoshi OZAWA <ozawa.tsuyoshi@gmail.com> wrote:

> Hi Jakub,
>
> You have 2 options:
> 1. Turning off virtual memory check as you mentioned.
> 2. Making yarn.nodemanager.vmem-pmem-ratio larger.
>
> 1. is reasonable choice if you cannot predict virtual memory usage in
> advance or you don't have any applications to check virtual memory.
>
> Thanks,
> - Tsuyoshi
>
>
>
> On Thu, Sep 11, 2014 at 7:24 PM, Jakub Stransky <stransky.ja@gmail.com>
> wrote:
>
>> Hi,
>>
>> thanks for reply. Machine is pretty small as it has 4GB of total memory.
>> So we reserved 1GB for OS, 1GB HBase (according to recommendation) so
>> remains 2GB thats what nodemanager claims.
>>
>> Actually it is a cluster of 5machines, 2 name-nodes and 3 data nodes. All
>> machines has similar parameters so the stronger ones are used for nn and
>> rest for dn. I know that hw is far away from ideal but it is a small
>> cluster for a POC and gaining some experiences.
>>
>> Back to the problem. At the time when this happens no other job is
>> running on cluster. All mappers (3) has already finished and we have single
>> reduce task which fails at ~ 70% of its progress on virtual memory
>> consumption. Dataset which is processing is 500MB of avro data file
>> compressed. Reducer doesn't cache anything intentionally, just divide a
>> records in various folders dynamically.
>> From RM console I clearly see that there is a free unused resources -
>> memory. Is there a way how to detect what consumed that assigned virtual
>> memory?  Because for a smaller amount of input data ~ 120MB compressed data
>> - job finishes just fine within 3 min.
>>
>> We have obviously a problem in scaling the task out. Could someone
>> provide some hints as it seems that we are missing something fundamental
>> here.
>>
>> Thanks for helping me out
>> Jakub
>>
>> On 11 September 2014 11:34, Susheel Kumar Gadalay <skgadalay@gmail.com>
>> wrote:
>>
>>> Your physical memory is 1GB on this node.
>>>
>>> What are the other containers (map tasks) running on this?
>>>
>>> You have given map memory as 768M and reduce memory as 1024M and am as
>>> 1024M.
>>>
>>> With AM and a single map task it is 1.7M and cannot start another
>>> container for reducer.
>>> Reduce these values and check.
>>>
>>> On 9/11/14, Jakub Stransky <stransky.ja@gmail.com> wrote:
>>> > Hello hadoop users,
>>> >
>>> > I am facing following issue when running M/R job during a reduce phase:
>>> >
>>> > Container
>>> [pid=22961,containerID=container_1409834588043_0080_01_000010] is
>>> > running beyond virtual memory limits. Current usage: 636.6 MB of 1 GB
>>> > physical memory used; 2.1 GB of 2.1 GB virtual memory used.
>>> > Killing container. Dump of the process-tree for
>>> > container_1409834588043_0080_01_000010 :
>>> > |- PID    PPID  PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
>>> > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>>> > |- 22961  16896 22961  22961  (bash)    0                      0
>>> >         9424896           312                 /bin/bash -c
>>> > /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true
>>> > -Dhadoop.metrics.log.level=WARN -Xmx768m
>>> >
>>> -Djava.io.tmpdir=/home/hadoop/yarn/local/usercache/jobsubmit/appcache/application_1409834588043_0080/container_1409834588043_0080_01_000010/tmp
>>> > -Dlog4j.configuration=container-log4j.properties
>>> >
>>> -Dyarn.app.container.log.dir=/home/hadoop/yarn/logs/application_1409834588043_0080/container_1409834588043_0080_01_000010
>>> > -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
>>> > org.apache.hadoop.mapred.YarnChild 153.87.47.116 47184
>>> > attempt_1409834588043_0080_r_000000_0 10
>>> >
>>> 1>/home/hadoop/yarn/logs/application_1409834588043_0080/container_1409834588043_0080_01_000010/stdout
>>> >
>>> 2>/home/hadoop/yarn/logs/application_1409834588043_0080/container_1409834588043_0080_01_000010/stderr
>>> > |- 22970 22961 22961 22961 (java) 24692 1165 2256662528 162659
>>> > /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true
>>> > -Dhadoop.metrics.log.level=WARN -Xmx768m
>>> >
>>> -Djava.io.tmpdir=/home/hadoop/yarn/local/usercache/jobsubmit/appcache/application_1409834588043_0080/container_1409834588043_0080_01_000010/tmp
>>> > -Dlog4j.configuration=container-log4j.properties
>>> >
>>> -Dyarn.app.container.log.dir=/home/hadoop/yarn/logs/application_1409834588043_0080/container_1409834588043_0080_01_000010
>>> > -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
>>> > org.apache.hadoop.mapred.YarnChild 153.87.47.116 47184
>>> > attempt_1409834588043_0080_r_000000_0 10 Container killed on request.
>>> Exit
>>> > code is 143
>>> >
>>> >
>>> > I have following settings with default ratio physical to vm set to 2.1
>>> :
>>> > # hadoop - yarn-site.xml
>>> > yarn.nodemanager.resource.memory-mb  : 2048
>>> > yarn.scheduler.minimum-allocation-mb : 256
>>> > yarn.scheduler.maximum-allocation-mb : 2048
>>> >
>>> > # hadoop - mapred-site.xml
>>> > mapreduce.map.memory.mb              : 768
>>> > mapreduce.map.java.opts              : -Xmx512m
>>> > mapreduce.reduce.memory.mb           : 1024
>>> > mapreduce.reduce.java.opts           : -Xmx768m
>>> > mapreduce.task.io.sort.mb            : 100
>>> > yarn.app.mapreduce.am.resource.mb    : 1024
>>> > yarn.app.mapreduce.am.command-opts   : -Xmx768m
>>> >
>>> > I have following questions:
>>> > - Is it possible to track down the vm consumption? Find what was the
>>> cause
>>> > for such a high vm.
>>> > - What is the best way to solve this kind of problems?
>>> > - I found following recommendation on the internet: " We actually
>>> recommend
>>> > disabling this check by setting yarn.nodemanager.vmem-check-enabled to
>>> false
>>> > as
>>> > there is reason to believe the virtual/physical ratio is exceptionally
>>> high
>>> > with some versions of Java / Linux." Is it a good way to go?
>>> >
>>> > My reduce task doesn't perform any super activity - just classify
>>> data, for
>>> > a given input key chooses the appropriate output folder and writes the
>>> data
>>> > out.
>>> >
>>> > Thanks for any advice
>>> > Jakub
>>> >
>>>
>>
>>
>>
>> --
>> Jakub Stransky
>> cz.linkedin.com/in/jakubstransky
>>
>>
>
>
> --
> - Tsuyoshi
>



-- 
Jakub Stransky
cz.linkedin.com/in/jakubstransky

Mime
View raw message