hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kaluskar, Sanjay" <skalus...@informatica.com>
Subject Adding entries to classpath
Date Wed, 11 Aug 2010 07:48:28 GMT
I am using Hadoop indirectly through PIG, and some of the UDFs (defined
by me) need other jars at runtime (around 150) some of which have
conflicting resource names. Hence, trying to unpack all of them and
repacking into a single jar doesn't work. My solution is to create a
single top-level jar that names all the dependencies in Class-Path in
the MANIFEST.MF. This is also simpler from a user's point of view. Of
course this requires the top-level jar and all the dependencies to be
created with a certain directory structure that I can control.
Currently, I have a structure where I have a root directory which
contains the top-level jar and a directory called lib, and all the
dependencies are in lib, and the top-level jar names the dependencies as
lib/x.jar lib/y.jar etc. I package all of this as a single zip file for
easy installation.
Just to be clear this is the dir structure:
root dir
    |--- top-level.jar
    |--- lib
            |--- x.jar
            |--- y.jar
I can't register top-level.jar in my PIG script (this is the recommended
approach) because PIG then unpacks & repackages everything into a single
jar, instead of including the jar on the classpath. I can't use
distributed cache because if I specify top-level.jar and lib separately
in mapred.cache.files, then the relative directory locations aren't
preserved. If I use the mapred.cache.archives option and specify the zip
file, I can't add the top-level jar to the classpath (because the
entries in mapred.job.classpath.files must be something from
If mapred.child.java.opts also allowed java.class.path to be augmented
(similar to java.library.path, which I am using for native libs that I
store in another dir parallel to lib), it would have solved my problem.
I could have specified the zip in mapred.cache.archives, and added the
jar to the classpath. Right now I can't see any solution, other than
using a shared file system and adding top-level.jar to HADOOP_CLASSPATH
- this works because I am using a small cluster that has a shared file
system but clearly it's not always feasible (and of course, it's
modifying Hadoop's environment).
Please suggest any alternatives you can think of.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message