Return-Path: X-Original-To: apmail-hadoop-common-dev-archive@www.apache.org Delivered-To: apmail-hadoop-common-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6153A9BB1 for ; Wed, 5 Oct 2011 08:09:44 +0000 (UTC) Received: (qmail 26162 invoked by uid 500); 5 Oct 2011 08:09:43 -0000 Delivered-To: apmail-hadoop-common-dev-archive@hadoop.apache.org Received: (qmail 26105 invoked by uid 500); 5 Oct 2011 08:09:42 -0000 Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-dev@hadoop.apache.org Received: (qmail 26095 invoked by uid 99); 5 Oct 2011 08:09:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Oct 2011 08:09:42 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [193.16.154.38] (HELO timmy.zylon.net) (193.16.154.38) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 05 Oct 2011 08:09:34 +0000 Received: (qmail 32295 invoked by uid 89); 5 Oct 2011 08:09:12 -0000 Received: from unknown (HELO ?192.168.1.112?) (ferdy.galema@kalooga.com@193.138.250.18) by 0 with SMTP; 5 Oct 2011 08:09:12 -0000 Message-ID: <4E8C10AA.4040001@kalooga.com> Date: Wed, 05 Oct 2011 10:09:14 +0200 From: Ferdy Galema User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.21) Gecko/20110831 Thunderbird/3.1.13 MIME-Version: 1.0 To: common-dev@hadoop.apache.org Subject: Re: RunJar classloader issues References: <4E69E238.4080009@kalooga.com> In-Reply-To: <4E69E238.4080009@kalooga.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Bumping this thread because currently I'm more aware of what is actually happening. If I understand correctly, when submitting jobs using RunJar the classpath is extended using a new classloader. This classloader adds the unzipped contents from the jar to the current thread classpath (contextClassLoader). This brings 2 issues to mind: 1) In RunJar, when constructing the new URLClassLoader, would it not be better to chain the *previously* contextClassLoader instead of using the system classloader? (The latter is used when the classloader argument is omitted in the ctor of URLClassLoader, which is what RunJar does). This is a truely a minor issue, since most of the times RunJar is used as a result of invocating 'hadoop jar' from the command line and therefore the previous thread contextClassLoader actually will be the system classloader. I bring this up for at least trying to understand the process. 2) To proceed on my previous findings on AbstractMapWritable, I think the problem of it unable to find classes is because it is loaded by a parent classloader (system classloader) instead of the new child classloader set by RunJar. The classloader of AbstractMapWritable is not this child classloader because it is already loaded (indirectly in Configuration) before the thread contextClassLoader is replaced in RunJar, therefore it is unable to find certain extracted classes. So why does AbstractMapWritable use the classloader of it's class [Class.forName(className)] instead of the current thread [Class.forName(className, true, Thread.currentThread().getContextClassLoader())]. Is it not wiser to always use the latter construction in general classloading code? Ferdy. On 09/09/2011 11:54 AM, Ferdy Galema wrote: > Sometimes when running hadoop jobs using the 'hadoop jar' command > there are issues with the classloader. I presume these are caused by > classes that are loaded BEFORE the commands main is invoced. For > example, when relying on the MapWritable in the command, it is not > possible to use a class that is not in the default idToClassMap. > MapWritable.class is loaded before the user job is unpacked and > therefore it's classloader will not be able to find custom classes. > (At least classes that are in the RunJar it's classloader classpath). > > I could not find any issues or discussion about this so I assume it is > somewhat of an obscure issue (please correct me if I'm wrong). Anyway > I would like to hear what you think of this and perhaps discuss a > possible solution. Such as spawning the command in a new JVM. > MapWritable or rather AbstractMapWritable uses a > Class.forName(className) construction, maybe this can be changed so > that uses the classloader of the current thread instead of its own > class. (Will this work?) > > A workaround for now is to explicitely put the jar itself on the > classpath, i.e. 'env HADOOP_CLASSPATH=myJar hadoop jar myJar'.