hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13070) classloading isolation improvements for stricter dependencies
Date Mon, 12 Dec 2016 19:18:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742855#comment-15742855

Sangjin Lee commented on HADOOP-13070:

I’ve been testing the latest patch (HADOOP-13998) and there seems to be one interesting
issue with the stricter classpath isolation, and that has to do with the {{ServiceLoader}}.

(1) {{ServiceLoader}}
The {{ServiceLoader}} essentially uses the following pattern to load service classes dynamically:
Enumeration<URL> defs = classloader.getResources(“META-INF/services/org.foo.ServiceInterface”);
while (defs.hasMoreElements()) {
  URL def = defs.nextElement();
  Iterator<String> names = parse(def);
  for (String name : names) {
    ServiceInterface si = (ServiceInterface)Class.forName(name, classloader);
First off, {{ClassLoader.getResources()}} will return all service files, *regardless of* where
the service file is, either in the user classpath or the parent classpath (bit more discussion
on {{ClassLoader.getResources()}} below).

Since all service files have been located and the calling class of {{Class.forName()}} is
{{ServiceLoader}} which is a system class, all service classes will be successfully loaded,
*regardless of* where the service class is, either in the user classpath or the parent classpath.

Technically this would represent an opportunity to circumvent the isolation and load stuff
from the parent classpath. That said, we could still regard this as a variation of a “system
facility providing a way to load things from both classpaths” case mentioned in the proposal
(section 2-1).

I thought about plugging this possibility, but there doesn’t seem to be an unambiguous way
to do this.

One approach I considered is to go above the call stack to identify who’s calling {{ServiceLoader.load()}}.
Suppose we use the calling class to enforce stricter loading. If a user class is a calling
class, it would load service files from both the user classpath and the parent classpath.
However, as it iterates over the classes, it will fail to load a non-system parent class.
This causes a *hard* failure on the iteration on {{ServiceLoader}}.

On the other hand, we could try to determine somehow a certain service file is a “non-system
parent service file” and not return that service file resource with {{ClassLoader.getResources()}}
to begin with. However, the notion of a “non-system parent service file” is not well defined,
and I don’t think there is a way to define this clearly.

I think the best way forward is to allow {{ServiceLoader}} to load services from both the
user and the parent classpath. I’d love to hear your thoughts on this.

(2) {{ClassLoader.getResources()}}
Currently {{ApplicationClassLoader}} does not override this. The javadoc for {{ClassLoader.getResources()}}
The search order is described in the documentation for getResource(String).
Since we do not override this today, we return the resources from the parent first and then
from the child, which is not quite the same as what the javadoc indicates. So it seems to
me that at minimum we want to change the order of resources so that it returns the child resources

The next question is whether it should return a (non-system) parent resource if a user class
calls this method. We could tighten this to filter out non-system parent resource. I am leaning
towards making that change.

Thoughts? Feedback? Concerns?
cc [~busbey]

> classloading isolation improvements for stricter dependencies
> -------------------------------------------------------------
>                 Key: HADOOP-13070
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13070
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>            Priority: Critical
>         Attachments: HADOOP-13070.poc.01.patch, Test.java, TestDriver.java, classloading-improvements-ideas-v.3.pdf,
classloading-improvements-ideas.pdf, classloading-improvements-ideas.v.2.pdf, lib.jar
> Related to HADOOP-11656, we would like to make a number of improvements in terms of classloading
isolation so that user-code can run safely without worrying about dependency collisions with
the Hadoop dependencies.
> By the same token, it should raised the quality of the user code and its specified classpath
so that users get clear signals if they specify incorrect classpaths.
> This will contain a proposal that will include several improvements some of which may
not be backward compatible. As such, it should be targeted to the next major revision of Hadoop.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message