hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Eckert (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-4511) Support for Manifest file inside Distributed Caches (Archives)
Date Fri, 24 Oct 2008 02:27:44 GMT
Support for Manifest file inside Distributed Caches (Archives)
--------------------------------------------------------------

                 Key: HADOOP-4511
                 URL: https://issues.apache.org/jira/browse/HADOOP-4511
             Project: Hadoop Core
          Issue Type: Improvement
    Affects Versions: 0.17.2
            Reporter: Martin Eckert
            Priority: Minor


I'm in a situation where I'm using the DistributedCache API to add a library package to my
hadoop job. The library bundle consists of a JAR file, native library files and data files.
At this point it is pretty cumbersome to set up the job properly so that the library can be
used from within the map/reduce job.

The best way I could come up with was to keep the <lib>.jar file outside of the archive
file and use the -libjars argument to point to the external JAR file. The archive is submitted
using DistributedCache.setCacheArchives() and DistributedCache.createSymlink().
To add the library path (with the native library files), I append -Djava.library.path=./symlink/lib
to the mapred.child.java.opts JobConf option. To reference the config file inside the archive
the relative path (e.g. ./symlink/conf/config.txt) is used.

It would be very helpful if these settings could largely be encapsulated inside the archive
itself in form of a Manifest file. The manifest file could define the relative path to the
jar file(s) and library path(s). Those would be automatically read and added to the jobs class
and library paths.

The config file could be referenced and assigned a name inside the manifest so that in the
code those would be available through the JobConf.get() method and used where needed.

There would be other opportunities that would come from this approach but mainly it would
make deployment and distribution of archived packages for Hadoop much easier.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message