Mailing-List: contact dev-help@pig.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@pig.apache.org
Date: Fri, 22 May 2015 22:57:18 +0000 (UTC)
From: "Rohini Palaniswamy (JIRA)" <jira@apache.org>
To: pig-dev@hadoop.apache.org
Message-ID: <JIRA.12830325.1431727563000.14668.1432335438194@Atlassian.JIRA>
In-Reply-To: <JIRA.12830325.1431727563000@Atlassian.JIRA>
References: <JIRA.12830325.1431727563000@Atlassian.JIRA>
 <JIRA.12830325.1431727563191@arcas>
Subject: [jira] [Commented] (PIG-4555) Add -XX:+UseNUMA for Tez jobs
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/PIG-4555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556958#comment-14556958 ] 

Rohini Palaniswamy commented on PIG-4555:
-----------------------------------------

bq. i end-up having my containers (the AM one) being killed because they use too much virtual memory (about 17GB of virtual memory)
   17GB is really bad. How much was the Xmx? What is the virtual memory without NUMA?

bq. But for sure, in my case, setting -XX:+UseNUMA do trigger an OOM.
   Are you sure it hits OOM or just the container being killed because of yarn.nodemanager.vmem-pmem-ratio being breached? 

bq. I'm pretty sure there is already some configuration variables one can set in its tez-site.xml file to set this option so no need to have pig force this setting by code. For what i understand, the real problem is not about -XX!:+UseNUMA. The real problem is more that some option from the tez configuration are ignored.
   TEZ_AM_LAUNCH_CMD_OPTS_DEFAULT is "-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC" . i.e  -XX:+UseNUMA is part of default tez AM options. In Pig, we give preference to mapreduce AM settings (if tez.am.launch.cmd-opts is not overriden in tez-site.xml) and translate them to tez instead of using the mentioned tez defaults. Since the mapreduce AM settings are always there from mapred-default.xml or mapred-site.xml, -XX:+UseNUMA is never there. So this is about making use of the default tez settings in Pig. If in a particular environment  -XX:+UseNUMA is problematic, it can be overriden in tez-site.xml.

The real issue of why Tez AM performed poorly without NUMA is still there and will be tracked in TEZ jira. You have some concerns raised and I don't have knowledgeable answers for them at this point. So moved this to 0.16 and will add this after we actually fully understand more about the NUMA behavior and what is happening with and without NUMA in Tez AM. 

     
> Add -XX:+UseNUMA for Tez jobs
> -----------------------------
>
>                 Key: PIG-4555
>                 URL: https://issues.apache.org/jira/browse/PIG-4555
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.16.0
>
>
>     For very big Tez jobs (~50K tasks), AM quickly goes OOM without -XX:+UseNUMA. tez.am.launch.cmd-opts default setting has that, but since pig gives preference to yarn.app.mapreduce.am.command-opts if present (which usually it is),  -XX:+UseNUMA is not there. Need to add -XX:+UseNUMA if we are picking up mapreduce setting.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)