hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Goodhope <kengoodh...@gmail.com>
Subject Re: Help on streaming jobs
Date Sat, 28 Aug 2010 04:25:42 GMT
Your number of reducers is not set by your number of keys. If you had
an input with a million unique keys, would you expect a million
reducers each processing one record?  The number is set in the conf.
It's the partiotiners job to divide the work among those reducers and
in this case since you didn't override the default of one, all work
went to the same reducer.

On Friday, August 27, 2010, Xin Feng <drestion@gmail.com> wrote:
> Hi,
>
> First post ....
>
> I wrote my own mapper and reducer in c++.
>
> I tried submitting the streaming jobs using the following command:
>
> path/to/hadoop jar path/to/streaming.jar -input path/to/input -output
> path/to/ouput -mapper my_own_mapper -reducer my_own_reducer
>
> The result shows that only 1 reducer was created everytime which
> received ALL ouput from all mappers.
>
> I did lots of tests and thought that this may be caused by the failure
> of the partitioner to recognise  the keys. So my question is how to
> let hadoop recognise the key?
>
> Below is case test with sample files:
>
> the content of input file1:
>
> tag1  123456   +  kaka KKKGGSSSSGG
> tag1  111         +  abc KKKGGGGGG
> tag2  1211       +  ddd AAAAKKGG
>
> I am assuming that "tag1" and "tag2" will be recognized as keys,
> because they are prefixs up to the first tab.
>
> the mapper will printf the following to STDOUT:
>
> tag1 123456
> tag1 111
> tag2 1211
>
> I am ASSUMING that the partitioner will generate TWO partitions:
>
> partition 1:
> tag1   123456
> tag1   111
>
> partition 2:
> tag2  1211
>
> which will initiate two reducers, because two keys exist.
>
> however, it turns out what reducer got was only 1 partition:
>
> tag1 123456
> tag1 111
> tag2 1211
>
> my reducer will get from its STDIN (confirmed):
>
> tag1 123456
> tag1 111
> tag2 1211
>
> Please note, "tag1" , the assumed key, was also included (Is hadoop
> supposed to do so?)
>
> Can anyone help me with this? appreciated!
>
> PS: I also tried
> -inputformat KeyValueInputFormat
> still failures..
>
>
> Xin
>

Mime
View raw message