hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@yahoo-inc.com>
Subject Re: Teragen defaults to 2 maps; terasort defaults to 1 reducer
Date Mon, 29 Jun 2009 21:59:54 GMT
These are due to the default #maps/#reduces in Map-Reduce.

Use:
$ bin/hadoop jar hadoop-*-dev-examples.jar teragen - 
Dmapred.map.tasks=8000 10000000000 /tera/in
$ bin/hadoop jar hadoop-*-dev-examples.jar terasort - 
Dmapred.reduce.tasks=5300 /tera/in /tera/out

Arun

On Jun 29, 2009, at 2:03 PM, Gross, Danny wrote:

> Hello all,
>
>
>
> I'm trying to run the hadoop-1.19.1-examples.jar teragen and terasort
> programs on a cluster.  I have two problems with these programs:
>
>
>
> 1.	The data is generated in a fashion to where it is not balanced
> across my cluster.  This is because the data is generated with 2 maps.
>
> 	*	With the command "hadoop jar hadoop-0.19.1-examples.jar
> teragen 1000000000 /terasort"  (or any size) per the example doc, I  
> get
> 2 maps.  With replication set to 2, this tends to place data more
> heavily on 2 of my nodes, and the cluster believes it is balanced.
>
>
>
> 2.	The terasort program runs out of disk space on the reduce
> operation.  This is because the program runs with a single reduce  
> task.
>
>
> 	*	When running "hadoop jar hadoop-0.19.1-examples.jar
> terasort /terasort /out" per the example doc, I get the appropriate
> number of maps, but one reduce.  I've scoured the web and the new  
> Hadoop
> book, and I'm just not able to change the number of reducers.  An
> example attempt was with the command "hadoop jar
> -Dmapred.reduce.tasks=16 hadoop-0.19.1-examples.jar terasort /terasort
> /out".
>
>
>
> Could anyone help shed some light on how to modify the execution of
> these programs to more appropriately balance the data, and spread the
> reduce load out across my cluster?
>
>
>
> Best regards,
>
>
>
> Danny Gross
>
>
>


Mime
View raw message