hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: why does hadoop creates /tmp/hadoop-user/hadoop-unjar-xxxx/ dir and unjar my fat jar?
Date Sun, 26 Oct 2014 03:10:50 GMT
If you use 'hadoop jar' to invoke your application, this is the default
behaviour. The reason it is done is that the utility supports use of
jars-within-jar feature, that lets one pack additional dependency jars into
an application as a lib/ subdirectory under the root of the main jar.

It is not a configurable behaviour presently, so given your inodes issue,
you may want to either use the jars-within-jar feature, which does not
produce massive amounts of .class files cause of use of packed dependent
jars within the jar's lib/, or you may avoid use of 'hadoop jar' (RunJar
utility) by invoking instead with the generated classpath:

java -cp $(hadoop classpath):my-fat-jar-with-all-dependencies.jar

On Sat, Oct 25, 2014 at 3:17 PM, Yang <teddyyyy123@gmail.com> wrote:

> I thought this might be because that hadoop wants to pack everything
> (including the -files dfs cache files) into one single jar, so I removed
> the -files commands I have.
> but it still extracts the jar. this is rather confusing
> On Fri, Oct 24, 2014 at 11:51 AM, Yang <teddyyyy123@gmail.com> wrote:
>> I just noticed that when I run a "hadoop jar
>> my-fat-jar-with-all-dependencies.jar" , it unjars the job jar in
>> /tmp/hadoop-username/hadoop-unjar-xxxx/ and extracts all the classes in
>> there.
>> the fat jar is pretty big, so it took up a lot of space (particularly
>> inodes ) and ran out of quota.
>> I wonder why do we have to unjar these classes on the **client node** ?
>> the jar won't even be accessed until on the compute nodes, right?

Harsh J

View raw message