Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-dev@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Message-ID: <4E8C10AA.4040001@kalooga.com>
Date: Wed, 05 Oct 2011 10:09:14 +0200
From: Ferdy Galema <ferdy.galema@kalooga.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
 rv:1.9.2.21) Gecko/20110831 Thunderbird/3.1.13
MIME-Version: 1.0
To: common-dev@hadoop.apache.org
Subject: Re: RunJar classloader issues
References: <4E69E238.4080009@kalooga.com>
In-Reply-To: <4E69E238.4080009@kalooga.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Bumping this thread because currently I'm more aware of what is actually 
happening. If I understand correctly, when submitting jobs using RunJar 
the classpath is extended using a new classloader. This classloader adds 
the unzipped contents from the jar to the current thread classpath 
(contextClassLoader). This brings 2 issues to mind:

1) In RunJar, when constructing the new URLClassLoader, would it not be 
better to chain the *previously* contextClassLoader instead of using the 
system classloader? (The latter is used when the classloader argument is 
omitted in the ctor of URLClassLoader, which is what RunJar does). This 
is a truely a minor issue, since most of the times RunJar is used as a 
result of invocating 'hadoop jar' from the command line and therefore 
the previous thread contextClassLoader actually will be the system 
classloader. I bring this up for at least trying to understand the process.

2) To proceed on my previous findings on AbstractMapWritable, I think 
the problem of it unable to find classes is because it is loaded by a 
parent classloader (system classloader) instead of the new child 
classloader set by RunJar. The classloader of AbstractMapWritable is not 
this child classloader because it is already loaded (indirectly in 
Configuration) before the thread contextClassLoader is replaced in 
RunJar, therefore it is unable to find certain extracted classes. So why 
does AbstractMapWritable use the classloader of it's class 
[Class.forName(className)] instead of the current thread 
[Class.forName(className, true, 
Thread.currentThread().getContextClassLoader())]. Is it not wiser to 
always use the latter construction in general classloading code?

Ferdy.

On 09/09/2011 11:54 AM, Ferdy Galema wrote:
> Sometimes when running hadoop jobs using the 'hadoop jar' command 
> there are issues with the classloader. I presume these are caused by 
> classes that are loaded BEFORE the commands main is invoced. For 
> example, when relying on the MapWritable in the command, it is not 
> possible to use a class that is not in the default idToClassMap. 
> MapWritable.class is loaded before the user job is unpacked and 
> therefore it's classloader will not be able to find custom classes. 
> (At least classes that are in the RunJar it's classloader classpath).
>
> I could not find any issues or discussion about this so I assume it is 
> somewhat of an obscure issue (please correct me if I'm wrong). Anyway 
> I would like to hear what you think of this and perhaps discuss a 
> possible solution. Such as spawning the command in a new JVM. 
> MapWritable or rather AbstractMapWritable uses a 
> Class.forName(className) construction, maybe this can be changed so 
> that uses the classloader of the current thread instead of its own 
> class. (Will this work?)
>
> A workaround for now is to explicitely put the jar itself on the 
> classpath, i.e. 'env HADOOP_CLASSPATH=myJar hadoop jar myJar'.