hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bourgon, Armel" <Armel.Bour...@citygridmedia.com>
Subject Re: hadoop 2.6 jar command doesn't load text files in classpath
Date Fri, 16 Oct 2015 00:52:18 GMT
Hello Chris,

Thanks for your interest in this issue. I set up a local installation of 2.5.2 and I am actually
running into the same issue.
I will work on a jira ticket tomorrow.

Best
—
Armel Bourgon

> On Oct 15, 2015, at 3:41 PM, Chris Nauroth <cnauroth@hortonworks.com> wrote:
> 
> Hello Armel,
> 
> That's an interesting find.  Thank you for reporting it.
> 
> I know you mentioned an upgrade from 0.20 to 2.6.  Do you also have the
> ability to test against an earlier 2.x release, like 2.5.2?  If the
> problem repros in 2.6.0, but not in 2.5.2, then I wonder if it's a
> regression introduced by the client-side classloader isolation that we
> shipped in 2.6.0.
> 
> https://issues.apache.org/jira/browse/HADOOP-10893
> 
> 
> If you're interested in looking at the code, the most relevant piece is
> the RunJar class, which is the main entry point of the "hadoop jar"
> command.
> 
> If you have a simplified consistent repro that demonstrates the problem,
> then I suggest filing a JIRA for further investigation.  Thanks again.
> 
> --Chris Nauroth
> 
> 
> 
> 
> On 10/15/15, 2:40 PM, "Bourgon, Armel" <Armel.Bourgon@citygridmedia.com>
> wrote:
> 
>> Hello,
>> 
>> To give you a bit of context, I wrote a java library that aims to provide
>> an easy way to coordinate multiple MR jobs and execute them with a single
>> jar submission. The final result is a "fat jar² (build using the maven
>> assembly plugin) that contains the different Mapper and Reducer classes
>> and a Main class that has the logic to submit the different jobs to the
>> cluster.
>> 
>> To accomplish this, the Main relies on some text files (packaged in the
>> jar) to be present. Those files are not needed by the MR jobs themselves,
>> it¹s some kind of configuration for the Main to know how it should
>> schedule the different MR jobs.
>> 
>> The jar is executed like that:
>> hadoop jar the_jar_file.jar <args>
>> 
>> It has been used in production for a long time now but recently we
>> decided to upgrade to hadoop 2.6 (we were using 0.20). All our jobs
>> packaged like that are failing because the Main cannot locate the text
>> files in the classpath.
>> 
>> I did a bit of debugging by replacing the Main with a piece of code that
>> print the content of the classpath. When running the jar with:
>> java -jar the_jar_file.jar <args>
>> 
>> I can see the text files in the list. But when I run the same jar with:
>> hadoop jar the_jar_file.jar <args>
>> 
>> The text files are missing. I assume that something changed in the way
>> the hadoop jar command read the jar and build the classpath. I found
>> someone complaining about the same issue on stakoverflow
>> (http://stackoverflow.com/questions/31670390/accessing-jar-resource-when-r
>> un-in-hadoop) but nobody replied.
>> 
>> I would like to be able to keep the same mechanism (keep those conf files
>> in the jar and access them at runtime from the classpath), maybe their is
>> an options to alter the way the jar command behave? Can someone point me
>> to the source code of the jar command?
>> 
>> Thanks!
>> 
> 


Mime
View raw message