pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-4846) Use pigmix to test the performance of pig on spark
Date Mon, 18 Apr 2016 17:41:26 GMT

    [ https://issues.apache.org/jira/browse/PIG-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246099#comment-15246099

Xuefu Zhang commented on PIG-4846:

The basic idea is to make max use of the given resources (memory and cpu). Depending on which
is scarece, we want to use the scarce one first, in your case, memory. In general, you want
to have at least 2G for per core for spark, and 4, 5, or 6 cores per executor. In our case,
we set 4 cores and 8G memory per executor. For executor memory, in general, 15-20% goes to
memory overhead. Driver memory is less critical unless there is an OOM, which requires more
memory. 2G is a good minimum.

For more details, I wrote a doc which was included in CDH5.7 for Hive on Spark. http://www.cloudera.com/documentation/enterprise/latest/topics/admin_hos_tuning.html.
While that's for Hive on Spark, some of the configurations may apply to Pig as well.

Let me know if you have more questions.

> Use pigmix to test the performance of pig on spark
> --------------------------------------------------
>                 Key: PIG-4846
>                 URL: https://issues.apache.org/jira/browse/PIG-4846
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>         Attachments: PIG-4846.patch, PIG-4846_1.patch
> We can compare the performance between mr and spark mode by pigmix.
> The introduction of pigmix is https://cwiki.apache.org/confluence/display/PIG/PigMix.
> PIG-4846.patch is to make pigmix run by specied exectype.

This message was sent by Atlassian JIRA

View raw message