hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pete Tyler <peteralanty...@gmail.com>
Subject Re: Deployment of jar files at startup
Date Mon, 20 Sep 2010 03:06:40 GMT
Thanks David, 

I've been trying to use DistributedCache as I've had it suggested to me twice but I'm afraid
I'm just not getting it. 

It appears I need to associate my use of DistributedCache.addFileToClassPath() with a specific
JobConf instance. If this is the case what does addFileToClassPath() give me that I don't
already get with setJar()? The performance hit from using setJar() fir every Job is huge so
I assume having to use addFileToClassPath() for every Job will also be huge.

I'm looking to add a jar to my Hadoop classpath just once and then use it for many different
map/reduce jobs. Effectively I am trying to dynamically have the same impact as hardcoding
my jar file to HADOOP_CLASSPATH in hadoop-env.sh for every node in my system. I still can't
see how to do this :(

On Sep 15, 2010, at 11:46 AM, David Rosenstrauch <darose@darose.net> wrote:

> On 09/14/2010 10:10 PM, Pete Tyler wrote:
>> I'm trying to figure out how to achieve the following from a Java client,
>> 1. My app (which is a web server) starts up
>> 2. As part of startup my jar file, which includes my map reduce classes are distributed
to hadoop nodes
>> 3. My web app uses map reduce to extract data without the performance overhead of
each job deploying a jar file, via setJar(), setJarByClass()
>> It looks like DistributedCache() has potential but the need for commands like 'hadoop
fs -copyFromLocal ...' and the API methods like '.getLocalCacheArchives()' look to be at odds
with my scenario. Any thoughts?
>> -Peter
> For step 2, you have 2 options on how to implement:
> a) call DistributedCache.addFileToClassPath(jarFileURI, conf);
> b) have your app implement Tool, use ToolRunner to launch it, and specify a -libjars
command line parm which will achieve the same effect as in (a).  See http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/util/Tool.html
and http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/util/GenericOptionsParser.html#GenericOptions
for details.
> HTH,
> DR

View raw message