Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EDCF9D1B1 for ; Thu, 1 Nov 2012 14:46:59 +0000 (UTC) Received: (qmail 21067 invoked by uid 500); 1 Nov 2012 14:46:58 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 20592 invoked by uid 500); 1 Nov 2012 14:46:48 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 20554 invoked by uid 99); 1 Nov 2012 14:46:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Nov 2012 14:46:47 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.214.48] (HELO mail-bk0-f48.google.com) (209.85.214.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Nov 2012 14:46:41 +0000 Received: by mail-bk0-f48.google.com with SMTP id ik5so1092713bkc.35 for ; Thu, 01 Nov 2012 07:46:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding :x-gm-message-state; bh=dYGgf857NMe1VM86ZhFr1j6lN3P7nJhG9qODB6LYMxY=; b=NuWuRm02UH3iXgpzYtJ/YmfYASQKoKKdAFi/oWuvjbfXiVo4OQCQdZmLuC2X0md2mB L+FNO5vlFif5i7gj2BSpYNqspF6W0p+2PBx+ywfKQN+gk6TaNkal+I6td19SzD+sh60r DH6NSGEa7gFWKNI7K55+L+FgdMm0HFyrQJk0PSMpTVWYRx/CvKTIMQJhFaGKTAy/ebCG zjqt3REFqJJdAb9uzL4ZH5NWiGX6bbcDcSuN82Aq962j0jpm/vQnmfGW9Zm+5REKtBTU hyyzuYpdW9tIiXBTYGkzsqE6Zigi3LYMhjPW3r8LxBCU/8oi22A5MNRHdY5Oj6ErTyqg /EOA== Received: by 10.204.156.81 with SMTP id v17mr12089662bkw.49.1351781178554; Thu, 01 Nov 2012 07:46:18 -0700 (PDT) Received: from [192.168.0.16] (HSI-KBW-095-208-169-199.hsi5.kabel-badenwuerttemberg.de. [95.208.169.199]) by mx.google.com with ESMTPS id 1sm5272365bks.3.2012.11.01.07.46.16 (version=SSLv3 cipher=OTHER); Thu, 01 Nov 2012 07:46:17 -0700 (PDT) Message-ID: <50928B37.7030202@zfabrik.de> Date: Thu, 01 Nov 2012 15:46:15 +0100 From: Henning Blohm User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121028 Thunderbird/16.0.2 MIME-Version: 1.0 To: mapreduce-user@hadoop.apache.org Subject: Re: Virtual memory problems on Ubuntu 12.04 (a.k.a. MALLOC_ARENA_MAX or HADOOP-7154) References: <31615395.861.1351302327462.JavaMail.lancenorskog@Lance-Norskogs-MacBook-Pro.local> In-Reply-To: <31615395.861.1351302327462.JavaMail.lancenorskog@Lance-Norskogs-MacBook-Pro.local> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Gm-Message-State: ALoCoQlwtmDTnRyh/TUf2YpKXcW0EZZLbEzy21VWz2J/+FGyOkitMCnPxPEzV/YUL6oBk1HaXjUm X-Virus-Checked: Checked by ClamAV on apache.org On 10/27/2012 03:45 AM, Lance Norskog wrote: > 1) Java uses different variables than 'malloc'. Look up 'Java garbage collection' to find out how it all works. What are you trying to say here? This is not a garbage collection problem. It's a problem with the new arena collector in glibc that has this impact on JVM executed multi-threaded apps (tempted to say look up "MALLOC_ARENA_MAX JVM virtual memory" to find out how it all works). > > 2) Is this a 32-bit kernel? Or Java version? Those top out at 2.1g address space. You need to run with a 64-bit kernel & Java to get real work done with Hadoop. As said in the original post: It's all 64-bit. -- Henning > > ----- Original Message ----- > | From: "Henning Blohm" > | To: mapreduce-user@hadoop.apache.org > | Sent: Thursday, October 25, 2012 8:52:00 AM > | Subject: Re: Virtual memory problems on Ubuntu 12.04 (a.k.a. MALLOC_ARENA_MAX or HADOOP-7154) > | > | Could not get it to make sense out of MALLOC_ARENA_MAX. No .bashrc > | etc. > | no env script seemed to have any impact. > | > | Made jobs work again by setting yarn.nodemanager.vmem-pmem-ratio=10. > | Now > | they probably run with some obscene and unnecessary vmem allocation > | (which I read does not come for free with the new malloc). What a > | crappy > | situation (and change) :-( > | > | Thanks, > | Henning > | > | On 10/25/2012 11:47 AM, Henning Blohm wrote: > | > Recently I have installed data nodes on Ubuntu 12.04 and observed > | > failing M/R jobs with errors like this: > | > > | > Diagnostics report from attempt_1351154628597_0002_m_000000_0: > | > Container > | > [pid=14529,containerID=container_1351154628597_0002_01_000002] is > | > running beyond virtual memory limits. Current usage: 124.4mb of > | > 1.0gb > | > physical memory used; 2.1gb of 2.1gb virtual memory used. Killing > | > container. > | > Dump of the process-tree for container_1351154628597_0002_01_000002 > | > : > | > |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) > | > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) > | > FULL_CMD_LINE > | > |- 14529 13550 14529 14529 (java) 678 18 2265411584 31856 > | > /home/gd/gd/jdk1.6.0_35/bin/java -Djava.net.preferIPv4Stack=true > | > -Dhadoop.metrics.log.level=WARN -Xmx1000M -XX:MaxPermSize=512M > | > -Djava.io.tmpdir=/home/gd/gd/gi-de-nosql.cdh4-base/data/yarn/usercache/gd/appcache/application_1351154628597_0002/container_1351154628597_0002_01_000002/tmp > | > -Dlog4j.configuration=container-log4j.properties > | > -Dyarn.app.mapreduce.container.log.dir=/home/gd/gd/gi-de-nosql.cdh4-base/logs/application_1351154628597_0002/container_1351154628597_0002_01_000002 > | > -Dyarn.app.mapreduce.container.log.filesize=0 > | > -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild > | > 192.168.178.25 36183 attempt_1351154628597_0002_m_000000_0 2 > | > > | > I am using CDH4.0.1 (hadoop 2.0.0) with the Yarn M/R implementation > | > on > | > Ubuntu 12.04 64Bit. > | > > | > According to HADOOP-7154 making sure MALLOC_ARENA_MAX=1 (or 4) is > | > exported should fix the issue. > | > > | > I tried the following: > | > > | > Exporting the environment variable MALLOC_ARENA_MAX with value 1 in > | > all hadoop shell scrips (e.g. yarn-env.sh). Checking the > | > launch_container.sh script that Yarn creates I can tell that it > | > indeed > | > contains the line > | > > | > export MALLOC_ARENA_MAX="1" > | > > | > But still I am getting the error above. > | > > | > In addition I tried adding > | > > | > > | > mapred.child.env > | > MALLOC_ARENA_MAX=1 > | > > | > > | > to mapred-site.xml. But that didn't seem to fix it either. > | > > | > Is there anything special that I need to configure on the server to > | > make the setting effective? > | > > | > Any idea would be great!! > | > > | > Thanks, > | > Henning > | > |