hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xin Feng <drest...@gmail.com>
Subject Re: Help on streaming jobs
Date Sat, 28 Aug 2010 04:52:02 GMT
Did you mean that i should include:

-D mapreduce.job.reduces=2

?
since two tags exist.


Xin



On Sat, Aug 28, 2010 at 12:25 AM, Ken Goodhope <kengoodhope@gmail.com> wrote:
> Your number of reducers is not set by your number of keys. If you had
> an input with a million unique keys, would you expect a million
> reducers each processing one record?  The number is set in the conf.
> It's the partiotiners job to divide the work among those reducers and
> in this case since you didn't override the default of one, all work
> went to the same reducer.
>
> On Friday, August 27, 2010, Xin Feng <drestion@gmail.com> wrote:
>> Hi,
>>
>> First post ....
>>
>> I wrote my own mapper and reducer in c++.
>>
>> I tried submitting the streaming jobs using the following command:
>>
>> path/to/hadoop jar path/to/streaming.jar -input path/to/input -output
>> path/to/ouput -mapper my_own_mapper -reducer my_own_reducer
>>
>> The result shows that only 1 reducer was created everytime which
>> received ALL ouput from all mappers.
>>
>> I did lots of tests and thought that this may be caused by the failure
>> of the partitioner to recognise  the keys. So my question is how to
>> let hadoop recognise the key?
>>
>> Below is case test with sample files:
>>
>> the content of input file1:
>>
>> tag1  123456   +  kaka KKKGGSSSSGG
>> tag1  111         +  abc KKKGGGGGG
>> tag2  1211       +  ddd AAAAKKGG
>>
>> I am assuming that "tag1" and "tag2" will be recognized as keys,
>> because they are prefixs up to the first tab.
>>
>> the mapper will printf the following to STDOUT:
>>
>> tag1 123456
>> tag1 111
>> tag2 1211
>>
>> I am ASSUMING that the partitioner will generate TWO partitions:
>>
>> partition 1:
>> tag1   123456
>> tag1   111
>>
>> partition 2:
>> tag2  1211
>>
>> which will initiate two reducers, because two keys exist.
>>
>> however, it turns out what reducer got was only 1 partition:
>>
>> tag1 123456
>> tag1 111
>> tag2 1211
>>
>> my reducer will get from its STDIN (confirmed):
>>
>> tag1 123456
>> tag1 111
>> tag2 1211
>>
>> Please note, "tag1" , the assumed key, was also included (Is hadoop
>> supposed to do so?)
>>
>> Can anyone help me with this? appreciated!
>>
>> PS: I also tried
>> -inputformat KeyValueInputFormat
>> still failures..
>>
>>
>> Xin
>>
>

Mime
View raw message