hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ed Mazur <ma...@cs.umass.edu>
Subject Re: Questions about dfs and MapRed in the Hadoop.
Date Tue, 05 Jan 2010 23:09:07 GMT
Hi Pedro,

I can answer a couple of these.

On Tue, Jan 5, 2010 at 5:46 PM, psdc1978 <psdc1978@gmail.com> wrote:
> 1 - What are the difference between the classes:
> org.apache.hadoop.mapred.Reducer.java and
> org.apache.hadoop.mapreduce.Reducer.java? In which case the 2 reducers
> are used?
> 2 - The same question for the Mapper.java?

These classes were refactored in 0.20. The older ones (mapred package)
were left to maintain backwards compatibility.

> 4 - What's the purpose of the property in hdfs-site.xml called
> "dfs.replication"?
> I've read what is defined in the Hadoop site,
> "dfs.replication - Default block replication. The actual number of
> replications can be specified when the file is created. The default is
> used if replication is not specified in create time. ", but I still
> haven't understand it. Is it in how many machines a file will be
> replicated?

Pretty much. Note that the underlying structure of an HDFS file is a
collection of large blocks (64MB default) and that it is these blocks
that are replicated.


View raw message