hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From psdc1978 <psdc1...@gmail.com>
Subject Re: Only running hadoop Map tasks
Date Wed, 06 Jan 2010 11:23:37 GMT
See my question inline.

On Tue, Jan 5, 2010 at 6:32 PM, Owen O'Malley <omalley@apache.org> wrote:
>
> On Jan 5, 2010, at 9:13 AM, psdc1978 wrote:
>
>> 1 - I would like to see what is output that the Maps is doing on my
>> example. Is it possible to put hadoop only running Map tasks,
>> excluding the Reduce tasks?
>
> Set the number of reduce tasks to 0.

I've updated the file "/opt/hadoop/src/mapred/mapred-default.xml" with
the following value:

<property>
  <name>mapred.reduce.tasks</name>
  <value>0</value>
  <description>The default number of reduce tasks per job. Typically set to 99%
  of the cluster's reduce capacity, so that if a node fails the reduces can
  still be executed in a single wave.
  Ignored when mapred.job.tracker is "local".
  </description>
</property>

<property>
  <name>mapred.reduce.parallel.copies</name>
  <value>0</value>
  <description>The default number of parallel transfers run by reduce
  during the copy(shuffle) phase.
  </description>
</property>

<property>
  <name>mapred.tasktracker.reduce.tasks.maximum</name>
  <value>0</value>
  <description>The maximum number of reduce tasks that will be run
  simultaneously by a task tracker.
  </description>
</property>

 <property>
    <name>mapred.task.profile.reduces</name>
    <value>0-0</value>
    <description> To set the ranges of reduce tasks to profile.
    mapred.task.profile has to be set to true for the value to be accounted.
    </description>
  </property>


Are these values enough to not run the reduce tasks? I don't think so,
because I've also searched "/tmp/hadoop-pcosta/" directory that to
find the output of the map, but I can't find them. Are this output
written in binary?





>
>> 2 - The output of the Maps is written into a temporary file?
>
> Each map's unsorted output will be sent to the OutputFormat, which writes it
> to the output directory.
>
>> 3 - How the output of the maps is passed to the reduce tasks? Is using
>> a socket or reading a file in the disk?
>
> MapReduce does not assume any shared disks between machines. The map outputs
> are transfered via http.
>
> -- Owen
>
>



-- 
Pedro

Mime
View raw message