hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5957) AM throws ClassNotFoundException with job classloader enabled if custom output format/committer is used
Date Wed, 02 Jul 2014 15:42:25 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050054#comment-14050054
] 

Sangjin Lee commented on MAPREDUCE-5957:
----------------------------------------

The gist of this issue is regarding the use of Configuration.getClass() and the use of the
thread context classloader (TCCL). Currently MRApps.setJobClassLoader() sets both the configuration
classloader and the TCCL at the same time. So once setJobClassLoader() is called, it is made
available in both contexts.

MAPREDUCE-5751 was caused because the job classloader was made available *too early as the
TCCL*. This issue is caused because the job classloader is made available *too late as the
configuration classloader*.

The normal classloading scheme (one class initializing another class via normal use or even
Class.forName) is unaffected by this if my understanding is correct.

I see two possible approaches for this:
(1) separate the timing of setting the job classloader as the configuration classloader and
the TCCL
I think while setting the TCCL should be delayed as much as possible (i.e. the current timing),
the job classloader can be installed as the configuration classloader much earlier. If the
configuration loads a user class, that's precisely what we need. If it loads a system class,
the job classloader will delegate anyhow. I don't think there is harm in setting the configuration
classloader early.

(2) set and unset the job classloader around the code that loads classes from the configuration
Identify the code points in MRAppMaster where Configuration.getClass() is needed, and set
and unset the job classloader around them. Although this would also solve this problem, the
downside is that one needs to make a determination that the job classloader is needed and
set/unset it. This is potentially brittle.

I think (1) is a more robust solution to this problem. Do you see an issue with taking that
approach?

I don't think the task (YarnChild) is affected by this.

> AM throws ClassNotFoundException with job classloader enabled if custom output format/committer
is used
> -------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5957
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5957
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>
> With the job classloader enabled, the MR AM throws ClassNotFoundException if a custom
output format class is specified.
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException:
Class com.foo.test.TestOutputFormat not found
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:473)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:374)
> 	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1459)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1456)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1389)
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.foo.test.TestOutputFormat
not found
> 	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
> 	at org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:222)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:469)
> 	... 8 more
> Caused by: java.lang.ClassNotFoundException: Class com.foo.test.TestOutputFormat not
found
> 	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
> 	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
> 	... 10 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message