pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-5246) Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2
Date Fri, 02 Jun 2017 17:01:04 GMT

    [ https://issues.apache.org/jira/browse/PIG-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035045#comment-16035045
] 

Rohini Palaniswamy commented on PIG-5246:
-----------------------------------------

Users should not have to specify -sparkversion 1 or 2 to determine which version. You should
detect that in the script. For Hadoop 1.x and 2.x it was done by checking for hadoop-core.jar.
You can do same thing here. Currently we still have problem of having to compile the shims
classes against different versions.

There is a hack I did internally for hbase 0.94 to hbase 0.98 migration for HBaseStorage to
support both HBase 0.94 and 0.98 with same pig jar during the migration. Have attached the
patch for it. It is more code and slightly convoluted as each class now redirects to the shims
class based on version detection. For eg: In Spark JobMetricsListener will redirect to JobMetricsListenerSpark1
or JobMetricsListenerSpark2. But for users it makes it very simple as they can use same pig
installation to run against any version. [~nkollar], do you want to try this approach as part
of PIG-5157 (Spark 2 support) and PIG-5191 (HBase 2 support) ?

 Similarly we can add a target to compile against all versions of both spark and hbase (and
hadoop 3.0 in future if required) and create a pig.jar which will run with anything. 



> Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2
> ------------------------------------------------------------------------------
>
>                 Key: PIG-5246
>                 URL: https://issues.apache.org/jira/browse/PIG-5246
>             Project: Pig
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>         Attachments: PIG-5246.1.patch, PIG-5246.patch
>
>
> in bin/pig.
> we copy assembly jar to pig's classpath in spark1.6.
> {code}
> # For spark mode:
> # Please specify SPARK_HOME first so that we can locate $SPARK_HOME/lib/spark-assembly*.jar,
> # we will add spark-assembly*.jar to the classpath.
> if [ "$isSparkMode"  == "true" ]; then
>     if [ -z "$SPARK_HOME" ]; then
>        echo "Error: SPARK_HOME is not set!"
>        exit 1
>     fi
>     # Please specify SPARK_JAR which is the hdfs path of spark-assembly*.jar to allow
YARN to cache spark-assembly*.jar on nodes so that it doesn't need to be distributed each
time an application runs.
>     if [ -z "$SPARK_JAR" ]; then
>        echo "Error: SPARK_JAR is not set, SPARK_JAR stands for the hdfs location of spark-assembly*.jar.
This allows YARN to cache spark-assembly*.jar on nodes so that it doesn't need to be distributed
each time an application runs."
>        exit 1
>     fi
>     if [ -n "$SPARK_HOME" ]; then
>         echo "Using Spark Home: " ${SPARK_HOME}
>         SPARK_ASSEMBLY_JAR=`ls ${SPARK_HOME}/lib/spark-assembly*`
>         CLASSPATH=${CLASSPATH}:$SPARK_ASSEMBLY_JAR
>     fi
> fi
> {code}
> after upgrade to spark2.0, we may modify it



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message