hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Something Something <mailinglist...@gmail.com>
Subject Re: Distributing our jars to all machines in a cluster
Date Wed, 16 Nov 2011 14:42:35 GMT
Bejoy - Thanks for the reply.  The '-libjars' is not working for me with
'hadoop jar'.  Also, as per the documentation (

Generic Options

The following options are supported by
, fs <http://hadoop.apache.org/common/docs/current/commands_manual.html#fs>
, fsck<http://hadoop.apache.org/common/docs/current/commands_manual.html#fsck>
, job<http://hadoop.apache.org/common/docs/current/commands_manual.html#job>
 and fetchdt<http://hadoop.apache.org/common/docs/current/commands_manual.html#fetchdt>

Does it work for you?  If it does, please let me know.  "Pre-distributing"
definitely works, but is that the best way?  If you have a big cluster and
Jars are changing often it will be time-consuming.

Also, how does Pig do it?  We update Pig UDFs often and put them only on
the 'client' machine (machine that starts the Pig job) and the UDF becomes
available to all machines in the cluster - automagically!  Is Pig doing the
pre-distributing for us?

Thanks for your patience & help with our questions.

On Wed, Nov 16, 2011 at 6:29 AM, Something Something <
mailinglists19@gmail.com> wrote:

> Hmm... there must be a different way 'cause we don't need to do that to
> run Pig jobs.
> On Tue, Nov 15, 2011 at 10:58 PM, Daan Gerits <daan.gerits@gmail.com>wrote:
>> There might be different ways but currently we are storing our jars onto
>> HDFS and register them from there. They will be copied to the machine once
>> the job starts. Is that an option?
>> Daan.
>> On 16 Nov 2011, at 07:24, Something Something wrote:
>> > Until now we were manually copying our Jars to all machines in a Hadoop
>> > cluster.  This used to work until our cluster size was small.  Now our
>> > cluster is getting bigger.  What's the best way to start a Hadoop Job
>> that
>> > automatically distributes the Jar to all machines in a cluster?
>> >
>> > I read the doc at:
>> > http://hadoop.apache.org/common/docs/current/commands_manual.html#jar
>> >
>> > Would -libjars do the trick?  But we need to use 'hadoop job' for that,
>> > right?  Until now, we were using 'hadoop jar' to start all our jobs.
>> >
>> > Needless to say, we are getting our feet wet with Hadoop, so appreciate
>> > your help with our dumb questions.
>> >
>> > Thanks.
>> >
>> > PS:  We use Pig a lot, which automatically does this, so there must be a
>> > clean way to do this.

View raw message