From mapreduce-issues-return-70497-apmail-hadoop-mapreduce-issues-archive=hadoop.apache.org@hadoop.apache.org Tue Jul 15 23:03:05 2014 Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C918B11FF7 for ; Tue, 15 Jul 2014 23:03:05 +0000 (UTC) Received: (qmail 87772 invoked by uid 500); 15 Jul 2014 23:03:05 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 87716 invoked by uid 500); 15 Jul 2014 23:03:05 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 87698 invoked by uid 99); 15 Jul 2014 23:03:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Jul 2014 23:03:05 +0000 Date: Tue, 15 Jul 2014 23:03:05 +0000 (UTC) From: "Jason Lowe (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAPREDUCE-5957) AM throws ClassNotFoundException with job classloader enabled if custom output format/committer is used MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062815#comment-14062815 ] Jason Lowe commented on MAPREDUCE-5957: --------------------------------------- Yeah, seeing the second patch I can understand why you initially gravitated to the first approach. I agree that it is quite likely things will break as the code is maintained. Not only do we have to flip-flop each time a new custom class is loaded but also each time we invoke a method on a user-provided instance. So if a new committer method is added or what-not that needs to be called before we finally throw the classloader switch for good then that's another necessary flip-flop. That's annoying and probably won't be remembered when the new method invocation is added. If we do stick with the flip-flop approach then it'd be nice if we had a nice way to wrap such code with a common flip-flop wrapper. I'm thinking something akin to how UserGroupInformation.doAs works so we can wrap code with common logic and don't have to copy-n-paste the wrapper everywhere. However the wrapped code has to be marshaled in the form of a Runnable or what-not, so that might not be much better in the end. So I guess it comes down to weighing the likelihood this will ever be needed in practice or if simply setting the conf classloader will "just work." I'm not as worried about the reflection case since that's probably rare to do without already leveraging Hadoop's conf class property processing. However I'm worried about thread creation since I could see a case where a committer/speculator/whatever creates some threads in their constructor or some other method we invoke before throwing the final classloader switch. If those threads end up inheriting the system TCCL instead of the job classloader TCCL then we have a problem. If that indeed is what would happen then it comes down to how likely is it that user code will want to create threads in constructors/methods that are invoked before the final classloader switch. I don't know offhand, unfortunately. If we can convince ourselves the thread use by user methods invoked before the final loader switch is either a non-issue or super unlikely then I think we should go for the simpler fix to set the conf loader early. Otherwise I think we may be stuck doing the flip flop case just for correctness sake, and hopefully we can make that as painless and obvious as possible. Which reminds me, the CommitterEventHandler creates a thread with which it invokes methods on the output committer. That thread is going to have the system TCCL since it was created before the final classloader switch, correct? If so would we also have problems if the output committer does lazy thread creation or class loading when commit methods are invoked as the job runs? Seems like the CommitterEventHandler event handling thread needs the job classloader, if specified. > AM throws ClassNotFoundException with job classloader enabled if custom output format/committer is used > ------------------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-5957 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5957 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 2.4.0 > Reporter: Sangjin Lee > Assignee: Sangjin Lee > Attachments: MAPREDUCE-5957.patch, MAPREDUCE-5957.patch, MAPREDUCE-5957.patch > > > With the job classloader enabled, the MR AM throws ClassNotFoundException if a custom output format class is specified. > {noformat} > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.foo.test.TestOutputFormat not found > at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:473) > at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:374) > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1459) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1456) > at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1389) > Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.foo.test.TestOutputFormat not found > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895) > at org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:222) > at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:469) > ... 8 more > Caused by: java.lang.ClassNotFoundException: Class com.foo.test.TestOutputFormat not found > at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893) > ... 10 more > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)