pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liyunzhang_intel (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (PIG-5246) Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2
Date Mon, 05 Jun 2017 02:00:09 GMT

    [ https://issues.apache.org/jira/browse/PIG-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036459#comment-16036459
] 

liyunzhang_intel edited comment on PIG-5246 at 6/5/17 1:59 AM:
---------------------------------------------------------------

[~rohini]: thanks for suggestion, for spark1 and spark2, it will be done by checking for spark-assembly.jar
or other things in the script and user need not specify the version of spark.
bq. For eg: In Spark JobMetricsListener will redirect to JobMetricsListenerSpark1 or JobMetricsListenerSpark2.
But for users it makes it very simple as they can use same pig installation to run against
any version.
It will be convenient for users in that way but not sure whether there is conflicts if both
jars of spark1 and spark2 in the pig classpath.
 [~zjffdu]:  
bq. Actually SPARK_ASSEMBLY_JAR is not a must-have thing for spark. 
  If SPARK_ASSEMBLY_JAR is not a must-have thing for spark1, how to judge spark1 or spark2?
bq.IMO, pig don't need to specify that, it is supposed to be set in spark-defaults.conf which
would apply to all spark apps.
  Pig on Spark use spark installation and will copy $SPARK_HOME/lib/spark-assembly*jar(spark1)
and $SPARK_HOME/jars/*jar to the classpath of pig. But we don't read spark-defaults.conf.
 We parse pig.properties and save the configuration about spark to [SparkContext|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java#L584].

 


was (Author: kellyzly):
[~rohini]: thanks for suggestion, for spark1 and spark2, it will be done by checking for spark-assembly.jar
or other things in the script and user need not specify the version of spark.
bq. For eg: In Spark JobMetricsListener will redirect to JobMetricsListenerSpark1 or JobMetricsListenerSpark2.
But for users it makes it very simple as they can use same pig installation to run against
any version.
It will be convenient for users in that way but not sure whether there is conflicts if both
jars of spark1 and spark2 in the pig classpath.
 [~zjffdu]:  bq. Actually SPARK_ASSEMBLY_JAR is not a must-have thing for spark. 
  If SPARK_ASSEMBLY_JAR is not a must-have thing for spark1, how to judge spark1 or spark2?
bq.IMO, pig don't need to specify that, it is supposed to be set in spark-defaults.conf which
would apply to all spark apps.
  Pig on Spark use spark installation and will copy $SPARK_HOME/lib/spark-assembly*jar(spark1)
and $SPARK_HOME/jars/*jar to the classpath of pig. But we don't read spark-defaults.conf.
 We parse pig.properties and save the configuration about spark to [SparkContext|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java#L584].

 

> Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2
> ------------------------------------------------------------------------------
>
>                 Key: PIG-5246
>                 URL: https://issues.apache.org/jira/browse/PIG-5246
>             Project: Pig
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>         Attachments: HBase9498.patch, PIG-5246.1.patch, PIG-5246.patch
>
>
> in bin/pig.
> we copy assembly jar to pig's classpath in spark1.6.
> {code}
> # For spark mode:
> # Please specify SPARK_HOME first so that we can locate $SPARK_HOME/lib/spark-assembly*.jar,
> # we will add spark-assembly*.jar to the classpath.
> if [ "$isSparkMode"  == "true" ]; then
>     if [ -z "$SPARK_HOME" ]; then
>        echo "Error: SPARK_HOME is not set!"
>        exit 1
>     fi
>     # Please specify SPARK_JAR which is the hdfs path of spark-assembly*.jar to allow
YARN to cache spark-assembly*.jar on nodes so that it doesn't need to be distributed each
time an application runs.
>     if [ -z "$SPARK_JAR" ]; then
>        echo "Error: SPARK_JAR is not set, SPARK_JAR stands for the hdfs location of spark-assembly*.jar.
This allows YARN to cache spark-assembly*.jar on nodes so that it doesn't need to be distributed
each time an application runs."
>        exit 1
>     fi
>     if [ -n "$SPARK_HOME" ]; then
>         echo "Using Spark Home: " ${SPARK_HOME}
>         SPARK_ASSEMBLY_JAR=`ls ${SPARK_HOME}/lib/spark-assembly*`
>         CLASSPATH=${CLASSPATH}:$SPARK_ASSEMBLY_JAR
>     fi
> fi
> {code}
> after upgrade to spark2.0, we may modify it



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message