pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Remi Catherinot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-4555) Add -XX:+UseNUMA for Tez jobs
Date Tue, 26 May 2015 08:26:17 GMT

    [ https://issues.apache.org/jira/browse/PIG-4555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558850#comment-14558850
] 

Remi Catherinot commented on PIG-4555:
--------------------------------------

Hi, i've written my answer in the original mail, with prefix "Remi: " in addition to what
just follow.

I'm note the creator of the initial JIRA who is the one wanting to pass UseNUMA to the TEZ
AM jvm through PIG. Like you said, the TEZ poor efficiency is a TEZ issue, not a PIG one.
As for PIG, the 'real' problem, is more the fact that a certain level of expertise is needed
to finely control what option end-up being used to launch TEZ AM. I made such a configuration
mistake on my 1st tests using +UseNUMA. It's hard to know, among all possible ways to set
command lines options, which one will end-up on the final command line.

For me, the case can be closed has not being a bug and if it still need a fix, it would more
be a documentation-fix on explaining command line option control with tez/yarn.

Another point : I do finely tune my servers, I do use interrupt pinning, a certain level of
process/cpu affinity & co, linux kernel module and drivers low-level settings, block devices
settings, sysctl settings, read/write disc cache ratio & co, disabling hyper threading
& co. I do play a lot with numa too and some other -XX jvm options. Even if I screw up
my 1st tests, adding UseNUMA which splits the young generation across NUMA would more likely
trigger a real OOM than solving it (because each young generation part is smaller, one small
amount per-numa node, not sure if the jvm accept to use the young generation of another numa
node when one if full), except if there is a bug the JVM itself when interleaving the heap
young generation. UseNUMA does not change the amount of memory the JVM can use (and so TEZ
inside the JVM). That is also why I reacted to the JIRA in the 1st place, because I'm pretty
sure the real problem is not where the JIRA suggest it is. Maybe the author had a problem
like mine : when forcing the UseNUMA option, he also forced some other options, and that is
maybe those options that solved the OOM issue.

-----Message d'origine-----
De : Rohini Palaniswamy (JIRA) [mailto:jira@apache.org] 
Envoyé : samedi 23 mai 2015 00:57
À : CATHERINOT Rémi Ext DTSI/DERS
Objet : [jira] [Commented] (PIG-4555) Add -XX:+UseNUMA for Tez jobs


    [ https://issues.apache.org/jira/browse/PIG-4555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556958#comment-14556958
] 

Rohini Palaniswamy commented on PIG-4555:
-----------------------------------------

bq. i end-up having my containers (the AM one) being killed because they use too much virtual
memory (about 17GB of virtual memory)
   17GB is really bad. How much was the Xmx? What is the virtual memory without NUMA?

Remi: When settings the +XX:UseNUMA option, i end up having some other default options overridden,
including my -Xmx option. So the jvm started to use its default sizing corresping to my hardware
(which is 64 bits and is a 64Gb server.

bq. But for sure, in my case, setting -XX:+UseNUMA do trigger an OOM.
   Are you sure it hits OOM or just the container being killed because of yarn.nodemanager.vmem-pmem-ratio
being breached? 

Remi: OOM is an abuse of langage there. it is effectively a container-killed issue do to container
virtual memory consumption extimation/limitation.

bq. I'm pretty sure there is already some configuration variables one can set in its tez-site.xml
file to set this option so no need to have pig force this setting by code. For what i understand,
the real problem is not about -XX!:+UseNUMA. The real problem is more that some option from
the tez configuration are ignored.
   TEZ_AM_LAUNCH_CMD_OPTS_DEFAULT is "-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps
-XX:+UseNUMA -XX:+UseParallelGC" . i.e  -XX:+UseNUMA is part of default tez AM options. In
Pig, we give preference to mapreduce AM settings (if tez.am.launch.cmd-opts is not overriden
in tez-site.xml) and translate them to tez instead of using the mentioned tez defaults. Since
the mapreduce AM settings are always there from mapred-default.xml or mapred-site.xml, -XX:+UseNUMA
is never there. So this is about making use of the default tez settings in Pig. If in a particular
environment  -XX:+UseNUMA is problematic, it can be overriden in tez-site.xml.

Remi: Using tez.am.launch.cmd-opts in tez-site.xml is the answer for the original author.
I'm no tez expert neither with nor without pig because I use yarn and pure mapreduce version
2 jobs. But I was pretty sure such configuration variables already existed. My point was more
to have the author use what already existed rather that maybe having someone starting to work
on a patch that was not needed or even which could have been a bad idea (forcing parameter
by code rather that by simple user-environment configuration). It's just that in the past,
I've seen some JIRA that have been implemented (like the one for CMX support and which is
currently being pushed into the future PIG 0.15) which I really think should not have been
implemented the way it is right now (more or less using lzo name to passe CMX codec and hack
in lzo/cmx false/true encoding detection to make one call the other, not sure that would be
stable for multiple jobs using both encodings in the same JVM since lots of compression codec
configurations are static). I use -XX:+UseNUMA myself now that I've have setted the right
configuration variable to not lose my other Xmx settings and it work pretty well also for
map-reduce v2 yarn jobs.

The real issue of why Tez AM performed poorly without NUMA is still there and will be tracked
in TEZ jira. You have some concerns raised and I don't have knowledgeable answers for them
at this point. So moved this to 0.16 and will add this after we actually fully understand
more about the NUMA behavior and what is happening with and without NUMA in Tez AM. 

     




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees
et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par
erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant
susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may
be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message
and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed
or falsified.
Thank you.



> Add -XX:+UseNUMA for Tez jobs
> -----------------------------
>
>                 Key: PIG-4555
>                 URL: https://issues.apache.org/jira/browse/PIG-4555
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.16.0
>
>
>     For very big Tez jobs (~50K tasks), AM quickly goes OOM without -XX:+UseNUMA. tez.am.launch.cmd-opts
default setting has that, but since pig gives preference to yarn.app.mapreduce.am.command-opts
if present (which usually it is),  -XX:+UseNUMA is not there. Need to add -XX:+UseNUMA if
we are picking up mapreduce setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message