hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xin Feng <drest...@gmail.com>
Subject Re: Help on streaming jobs
Date Sat, 28 Aug 2010 05:09:44 GMT
in both the mapper and the reducer, i used :

while(fget(stdin,"%s\t%s",var)){
}

to exhaust the stdin, is this the reason that only 1 reducer was
initiated? By the way, there were always a couple of mappers
initiated.

Xin



On Sat, Aug 28, 2010 at 12:52 AM, Xin Feng <drestion@gmail.com> wrote:
> Did you mean that i should include:
>
> -D mapreduce.job.reduces=2
>
> ?
> since two tags exist.
>
>
> Xin
>
>
>
> On Sat, Aug 28, 2010 at 12:25 AM, Ken Goodhope <kengoodhope@gmail.com> wrote:
>> Your number of reducers is not set by your number of keys. If you had
>> an input with a million unique keys, would you expect a million
>> reducers each processing one record?  The number is set in the conf.
>> It's the partiotiners job to divide the work among those reducers and
>> in this case since you didn't override the default of one, all work
>> went to the same reducer.
>>
>> On Friday, August 27, 2010, Xin Feng <drestion@gmail.com> wrote:
>>> Hi,
>>>
>>> First post ....
>>>
>>> I wrote my own mapper and reducer in c++.
>>>
>>> I tried submitting the streaming jobs using the following command:
>>>
>>> path/to/hadoop jar path/to/streaming.jar -input path/to/input -output
>>> path/to/ouput -mapper my_own_mapper -reducer my_own_reducer
>>>
>>> The result shows that only 1 reducer was created everytime which
>>> received ALL ouput from all mappers.
>>>
>>> I did lots of tests and thought that this may be caused by the failure
>>> of the partitioner to recognise  the keys. So my question is how to
>>> let hadoop recognise the key?
>>>
>>> Below is case test with sample files:
>>>
>>> the content of input file1:
>>>
>>> tag1  123456   +  kaka KKKGGSSSSGG
>>> tag1  111         +  abc KKKGGGGGG
>>> tag2  1211       +  ddd AAAAKKGG
>>>
>>> I am assuming that "tag1" and "tag2" will be recognized as keys,
>>> because they are prefixs up to the first tab.
>>>
>>> the mapper will printf the following to STDOUT:
>>>
>>> tag1 123456
>>> tag1 111
>>> tag2 1211
>>>
>>> I am ASSUMING that the partitioner will generate TWO partitions:
>>>
>>> partition 1:
>>> tag1   123456
>>> tag1   111
>>>
>>> partition 2:
>>> tag2  1211
>>>
>>> which will initiate two reducers, because two keys exist.
>>>
>>> however, it turns out what reducer got was only 1 partition:
>>>
>>> tag1 123456
>>> tag1 111
>>> tag2 1211
>>>
>>> my reducer will get from its STDIN (confirmed):
>>>
>>> tag1 123456
>>> tag1 111
>>> tag2 1211
>>>
>>> Please note, "tag1" , the assumed key, was also included (Is hadoop
>>> supposed to do so?)
>>>
>>> Can anyone help me with this? appreciated!
>>>
>>> PS: I also tried
>>> -inputformat KeyValueInputFormat
>>> still failures..
>>>
>>>
>>> Xin
>>>
>>
>

Mime
View raw message