spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dbtsai <...@git.apache.org>
Subject [GitHub] spark pull request: [SPARK-1870][branch-0.9] Jars added by sc.addJ...
Date Tue, 20 May 2014 04:47:47 GMT
GitHub user dbtsai opened a pull request:

    https://github.com/apache/spark/pull/834

    [SPARK-1870][branch-0.9] Jars added by sc.addJar are not in the default classLoader in
executor for YARN

    The summary is copied from Sandy's comment in the mailing list.
    
    The relevant difference between YARN and standalone is that, on YARN, the 
    app jar is loaded by the system classloader instead of Spark's custom URL
    classloader.
    
    On YARN, the system classloader knows about [the classes in the spark jars,
    the classes in the primary app jar].   The custom classloader knows about
    [the classes in secondary app jars] and has the system classloader as its
    parent.
    
    A few relevant facts (mostly redundant with what Sean pointed out):
    * Every class has a classloader that loaded it.
    * When an object of class B is instantiated inside of class A, the
    classloader used for loading B is the classloader that was used for loading A.
    * When a classloader fails to load a class, it lets its parent classloader
    try.  If its parent succeeds, its parent becomes the "classloader that
    loaded it".
    
    So suppose class B is in a secondary app jar and class A is in the primary
    app jar:
    1. The custom classloader will try to load class A.
    2. It will fail, because it only knows about the secondary jars.
    3. It will delegate to its parent, the system classloader.
    4. The system classloader will succeed, because it knows about the primary
    app jar.
    5. A's classloader will be the system classloader.
    6. A tries to instantiate an instance of class B.
    7. B will be loaded with A's classloader, which is the system classloader.
    8. Loading B will fail, because A's classloader, which is the system
    classloader, doesn't know about the secondary app jars.
    
    In Spark standalone, A and B are both loaded by the custom classloader, so
    this issue doesn't come up.
    
    In this PR, we don't use customClassLoader anymore. We add URL to the 
    current classloader instead. Since AddURL is protected method in URLClassLoader,
    calling the protected method is achieved through reflection.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dbtsai/spark branch-0.9-dbtsai-classloader

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/834.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #834
    
----
commit 474ef2c936b8f659521a519c103bc7fdb116353b
Author: DB Tsai <dbtsai@alpinenow.com>
Date:   2014-05-20T04:34:58Z

    Fixed the classLoader issue in 0.9 branch.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message