hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5957) AM throws ClassNotFoundException with job classloader enabled if custom output format/committer is used
Date Tue, 15 Jul 2014 23:03:05 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062815#comment-14062815
] 

Jason Lowe commented on MAPREDUCE-5957:
---------------------------------------

Yeah, seeing the second patch I can understand why you initially gravitated to the first approach.
 I agree that it is quite likely things will break as the code is maintained.  Not only do
we have to flip-flop each time a new custom class is loaded but also each time we invoke a
method on a user-provided instance.  So if a new committer method is added or what-not that
needs to be called before we finally throw the classloader switch for good then that's another
necessary flip-flop.  That's annoying and probably won't be remembered when the new method
invocation is added.

If we do stick with the flip-flop approach then it'd be nice if we had a nice way to wrap
such code with a common flip-flop wrapper.  I'm thinking something akin to how UserGroupInformation.doAs
works so we can wrap code with common logic and don't have to copy-n-paste the wrapper everywhere.
 However the wrapped code has to be marshaled in the form of a Runnable or what-not, so that
might not be much better in the end.

So I guess it comes down to weighing the likelihood this will ever be needed in practice or
if simply setting the conf classloader will "just work."  I'm not as worried about the reflection
case since that's probably rare to do without already leveraging Hadoop's conf class property
processing.  However I'm worried about thread creation since I could see a case where a committer/speculator/whatever
creates some threads in their constructor or some other method we invoke before throwing the
final classloader switch.  If those threads end up inheriting the system TCCL instead of the
job classloader TCCL then we have a problem.  If that indeed is what would happen then it
comes down to how likely is it that user code will want to create threads in constructors/methods
that are invoked before the final classloader switch.  I don't know offhand, unfortunately.

If we can convince ourselves the thread use by user methods invoked before the final loader
switch is either a non-issue or super unlikely then I think we should go for the simpler fix
to set the conf loader early.  Otherwise I think we may be stuck doing the flip flop case
just for correctness sake, and hopefully we can make that as painless and obvious as possible.

Which reminds me, the CommitterEventHandler creates a thread with which it invokes methods
on the output committer.  That thread is going to have the system TCCL since it was created
before the final classloader switch, correct?  If so would we also have problems if the output
committer does lazy thread creation or class loading when commit methods are invoked as the
job runs?  Seems like the CommitterEventHandler event handling thread needs the job classloader,
if specified.



> AM throws ClassNotFoundException with job classloader enabled if custom output format/committer
is used
> -------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5957
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5957
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: MAPREDUCE-5957.patch, MAPREDUCE-5957.patch, MAPREDUCE-5957.patch
>
>
> With the job classloader enabled, the MR AM throws ClassNotFoundException if a custom
output format class is specified.
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException:
Class com.foo.test.TestOutputFormat not found
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:473)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:374)
> 	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1459)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1456)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1389)
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.foo.test.TestOutputFormat
not found
> 	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
> 	at org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:222)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:469)
> 	... 8 more
> Caused by: java.lang.ClassNotFoundException: Class com.foo.test.TestOutputFormat not
found
> 	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
> 	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
> 	... 10 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message