hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ratandeep Ratti (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered in Hive
Date Fri, 18 Sep 2015 10:32:04 GMT
Ratandeep Ratti created HIVE-11878:
--------------------------------------

             Summary: ClassNotFoundException can possibly  occur if multiple jars are registered
in Hive
                 Key: HIVE-11878
                 URL: https://issues.apache.org/jira/browse/HIVE-11878
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 1.2.1
            Reporter: Ratandeep Ratti
            Assignee: Ratandeep Ratti


When we register a jar on the Hive console. Hive creates a fresh URL classloader which includes
the path of the current jar to be registered and all the jar paths of the parent classloader.
The parent classlaoder is the current ThreadContextClassLoader. Once the URLClassloader is
created Hive sets that as the current ThreadContextClassloader.

So if we register multiple jars in Hive, there will be multiple URLClassLoaders created, each
classloader including the jars from its parent and the one extra jar to be registered. The
last URLClassLoader created will end up as the current ThreadContextClassLoader. (See details:
org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath)

Now here's an example in which the above strategy can lead to a CNF exception.
We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class *c1* and internally
relies on class *c2* in jar *j2*. We register *j1* first, the URLClassLoader *u1* is created
and also set as the ThreadContextClassLoader. We register *j2* next, the new URLClassLoader
created will be *u2* with *u1* as parent and *u2* becomes the new ThreadContextClassLoader.
Note *u2* includes paths to both jars *j1* and *j2* whereas *u1* only has paths to *j1* (For
details see: org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath).

Now when we register class *c1* under a temporary function in Hive, we load the class using
{code} class.forName("c1", true, Thread.currentThread().getContextClassLoader()) {code} .
The currentThreadContext class-loader is *u2*, and it has the path to the class *c1*, but
note that Class-loaders work by delegating to parent class-loader first. In this case class
*c1* will be found and *defined* by class-loader *u1*.

Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say initialize) is called
in *c1*, which references the class *c2*, *c2* will not be found since the class-loader used
to search for *c2* will be *u1* (Since the caller's class-loader is used to load a class)


I've added a qtest to explain the problem. Please see the attached patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message