hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gibbon, Robert, VF-Group" <Robert.Gib...@vodafone.com>
Subject RE: Problem: when I run a pig's script I got one reduce task
Date Thu, 05 Aug 2010 13:38:05 GMT

Use the PARALLEL clause of course!

PARALLEL n

Increase the parallelism of a job by specifying the number of reduce
tasks, n. The default value for n is 1 (one reduce task). Note the
following:

    * Parallel only affects the number of reduce tasks. Map parallelism
is determined by the input file, one map for each HDFS block.
    * If you don't specify parallel, you still get the same map
parallelism but only one reduce task.

For more information, see the Pig Cookbook. 


-----Original Message-----
From: Marcos Pinto [mailto:marcoscba@gmail.com] 
Sent: Donnerstag, 5. August 2010 14:56
To: general@hadoop.apache.org
Subject: Problem: when I run a pig's script I got one reduce task

Hi guys, how u doing?

I am learning how to use hadoop and I got this problem:
I set up a cluster with 5 nodes( 4 datanode n 1 namenode) and I used the
same configuration for jobtracker n tasktracker.
when I run a pig's script I get many map's( like 15) but just 1
reduce!!!!!
this kills all the parallel processing. For example.
I have a file that has 1 GB and when I run the pig's script in a cluster
It takes about 50 minutes to process. =(

So I really appreciate if someone could help with any tip. Thanks for
your time.

Mime
View raw message