hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chengxiang li" <chengxiang...@intel.com>
Subject Re: Review Request 30107: HIVE-9410, ClassNotFoundException occurs during hive query case execution with UDF defined[Spark Branch]
Date Fri, 23 Jan 2015 02:57:59 GMT


> On 一月 23, 2015, 2:05 a.m., Xuefu Zhang wrote:
> > I'm wondering what's the story for Hive CLI. Hive CLI can add jars from local file
system. Would this work for Hive on Spark?

Hive CLI add jars to classpath dynamically same as this patch does for RemoteDriver, update
thread context classloader with added jars path included. For Hive on Spark, Hive CLI stay
the same, the issue is that RemoteDriver does not add these added jars into its class path,
so the NoClassFound error come out while RemoteDriver side need related class.


> On 一月 23, 2015, 2:05 a.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java, line 367
> > <https://reviews.apache.org/r/30107/diff/4/?file=829688#file829688line367>
> >
> >     Callers of getBaseWork() will add the jars to the classpath. Why this is necessary?
Who are the callers? Any side-effect?

The reason why we need to do this is that, getBaseWork() would generate MapWork/ReduceWork
which contains Hive operators inside, and UDTFOperator which contains added jar class need
to be loaded. To load added jar dynamically, we need to reset thread context classloader,
as mentioned in previous change summary, unlike HiveCLI, there are 2 threads in RemoteDriver
side may need to load added jar, For akka thread, there is no proper cut-in point for add
jars to classpath.
The side-effect is that, many HiveCLI threads may have to check to update its classload unneccsary.
Another possible solution is that, we update SystemClassLoader for RemoteDriver dynamically,
which must be done in a quite hacky way, such as:

        URLClassLoader sysloader = (URLClassLoader) ClassLoader.getSystemClassLoader();
        Class sysclass = URLClassLoader.class;

        try {
            Method method = sysclass.getDeclaredMethod("addURL", parameters);
            method.setAccessible(true);
            method.invoke(sysloader, new Object[] {u});
        } catch (Throwable t) {
            t.printStackTrace();
            throw new IOException("Error, could not add URL to system classloader");
        }

Which one do you prefer?


> On 一月 23, 2015, 2:05 a.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java, line
220
> > <https://reviews.apache.org/r/30107/diff/4/?file=829689#file829689line220>
> >
> >     So, this is the code that adds the jars to the classpath of the remote driver?
> >     
> >     I'm wondering why these jars are necessary in order to deserailize SparkWork.

Same as previous comments, SparkWork contains MapWork/ReduceWork which contains operator tree,
UTFFOperator need to load added jar class.


- chengxiang


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30107/#review69329
-----------------------------------------------------------


On 一月 22, 2015, 9:23 a.m., chengxiang li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30107/
> -----------------------------------------------------------
> 
> (Updated 一月 22, 2015, 9:23 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9410
>     https://issues.apache.org/jira/browse/HIVE-9410
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The RemoteDriver does not contains added jar in it's classpath, so it would failed to
desrialize SparkWork due to NoClassFoundException. For Hive on MR, while use add jar through
Hive CLI, Hive add jar into CLI classpath(through thread context classloader) and add it to
distributed cache as well. Compare to Hive on MR, Hive on Spark has an extra RemoteDriver
componnet, we should add added jar into it's classpath as well.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d7cb111 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 30a00a7

>   spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java 00aa4ec 
>   spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java 1eb3ff2

>   spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 5f9be65

>   spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java PRE-CREATION

> 
> Diff: https://reviews.apache.org/r/30107/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> chengxiang li
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message