hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: Question on distribution of classes and jobs
Date Tue, 07 Apr 2009 18:36:42 GMT
On Fri, Apr 3, 2009 at 11:39 PM, Foss User <fossist@gmail.com> wrote:

> If I have written a WordCount.java job in this manner:
>
>        conf.setMapperClass(Map.class);
>        conf.setCombinerClass(Combine.class);
>        conf.setReducerClass(Reduce.class);
>
> So, you can see that three classes are being used here.  I have
> packaged these classes into a jar file called wc.jar and I run it like
> this:
>
> $ bin/hadoop jar wc.jar WordCountJob
>
> 1) I want to know when the job runs in a 5 machine cluster, is the
> whole JAR file distributed across the 5 machines or the individual
> class files are distributed individually?


The whole jar.

>
>
> 2) Also, let us say the number of reducers are 2 while the number of
> mappers are 5. What happens in this case? How are the class files or
> jar files distributed?


It's uploaded into HDFS; specifically into a subdirectory of wherever you
configured mapred.system.dir.

>
>
> 3) Are they distributed via RPC or HTTP?


The client uses the HDFS protocol to inject its jar file into HDFS. Then all
the TaskTrackers retrieve it with the same protocol

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message