hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thamizhannal Paramasivam <thamizhanna...@gmail.com>
Subject Re: reducer behavior
Date Tue, 24 Jan 2012 11:44:02 GMT
Thanks a lot Harsh.
I am sure that it would be a logical issue on Reducer. when reducer=1 it
works as expected.
But counter output also gives expected result irrespective of num of
reducer.
Here are the counter ouput:
12/01/24 17:02:16 INFO mapred.JobClient:     NUM_RECORDS=66
12/01/24 17:02:16 INFO mapred.JobClient:   Job Counters
12/01/24 17:02:16 INFO mapred.JobClient:     Launched reduce tasks=4
12/01/24 17:02:16 INFO mapred.JobClient:     Launched map tasks=2
12/01/24 17:02:16 INFO mapred.JobClient:     Data-local map tasks=2
12/01/24 17:02:16 INFO mapred.JobClient:   FileSystemCounters
12/01/24 17:02:16 INFO mapred.JobClient:     FILE_BYTES_READ=1028
12/01/24 17:02:16 INFO mapred.JobClient:     HDFS_BYTES_READ=984
12/01/24 17:02:16 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=2288
12/01/24 17:02:16 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=5139
12/01/24 17:02:16 INFO mapred.JobClient:   Map-Reduce Framework
12/01/24 17:02:16 INFO mapred.JobClient:     Reduce input groups=6
12/01/24 17:02:16 INFO mapred.JobClient:     Combine output records=0
12/01/24 17:02:16 INFO mapred.JobClient:     Map input records=6
12/01/24 17:02:16 INFO mapred.JobClient:     Reduce shuffle bytes=873
12/01/24 17:02:16 INFO mapred.JobClient:     Reduce output records=66
12/01/24 17:02:16 INFO mapred.JobClient:     Spilled Records=12
12/01/24 17:02:16 INFO mapred.JobClient:     Map output bytes=992
12/01/24 17:02:16 INFO mapred.JobClient:     Map input bytes=794
12/01/24 17:02:16 INFO mapred.JobClient:     Combine input records=0
12/01/24 17:02:16 INFO mapred.JobClient:     Map output records=6
12/01/24 17:02:16 INFO mapred.JobClient:     Reduce input records=6

It says Reduce input records=6 &  Reduce output records=66, but there are
actually 22 output records on reducer.

I use custome output format,
public class CustomMultipleTextOutputFormat<K, V> extends
        MultipleTextOutputFormat<K, V> {

    @Override
    protected String generateFileNameForKeyValue(K key, V value, String
name) {
        String[] keys = key.toString().split("%");
        if ( keys.length != 3 ) {
            return key.toString();
        }
        return keys[2].toString();
    }
}
I am not sure what am I missing? Any suggestion would be appreciated.
Thanks,
Tamil

On Sun, Jan 22, 2012 at 1:24 AM, Harsh J <harsh@cloudera.com> wrote:

> The only difference would be that with 4 reducers your keys would get
> partitioned based on their hashCode() implementation (if you use the
> default hash partitioner) (I'd check the key impl. here, first thing, if
> its a custom key impl.), and each be sent to one reducer.
>
> Check the input record counters on your reducers, and the total map output
> record counters - they should add up and be equal to the latter. Also make
> sure you aren't skipping out on the reducer iterator under any condition,
> when you are doing the reducer op.
>
> I'm guessing its mostly your logic that's somehow causing this but I do
> not have your source bits to say that for sure.
>
> On 21-Jan-2012, at 11:47 PM, Thamizhannal Paramasivam wrote:
>
> Hi All,
> I am experimenting MapReduce program on Hadoop-0.19. This program has
> single input file with 7 records(later it can have many records on multiple
> files) and each input suppose to produce 11 output records. When it runs
> with no_of_reducer=4. It produces only 33 records. But, when I ran with
> no_of_reducer=1 then it produces 77 records as expected.
>
> What could be the reason for this ? I am missing any configuration
> parameter.
>
> Thanks
> Tamil
>
>
> --
> Harsh J
> Customer Ops. Engineer, Cloudera
>
>

Mime
View raw message