hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikant Dindokar <ravikant.i...@gmail.com>
Subject Re: Reducer called twice for same key
Date Mon, 29 Jun 2015 05:51:14 GMT
Adding source code for more clarity

Problem statement is simple

PartitionFileMapper : it takes input file which has tab separated value V ,
P
It emits (V, -1#P)

ALFileMapper : It takes input file which has tab separated values V, EL
It emits (V, E#-1)

in reducer I want to emit
(V,E#P)

Thanks
Ravikant

On Mon, Jun 29, 2015 at 11:04 AM, Ravikant Dindokar <ravikant.iisc@gmail.com
> wrote:

> By custom key, did you meant some class object ? then no.
>
> I have two map methods each having different file as input. And both map
> methods emit *Longwritable key* type. But As in stdout of container file
> I can see,
>
> key & value separated by ':'
>
> ./container_1435326857837_0036_01_000102/stdout:Reduce:*391*:-1#11
> ./container_1435326857837_0036_01_000102/stdout:Reduce:*391*
> :3278620528725786624:5352454#-1
>
> for key 391 reducer is called twice. , one for value from first map while
> one for value from other map.
>
> In map method I parse the string from input file as Long variable and then
> emit it as LongWritable.
>
> Is there something I am missing when I use multipleInput
> (org.apache.hadoop.mapreduce.lib.input.MultipleInputs)?
>
> Thanks
> Ravikant
>
> On Mon, Jun 29, 2015 at 9:22 AM, Harshit Mathur <mathursharp@gmail.com>
> wrote:
>
>> As per Map Reduce, it is not possible that two different reducers will
>> get same keys.
>> I think you have created some custom key type? If that is the case then
>> there should be some issue with the comparator.
>>
>> On Mon, Jun 29, 2015 at 12:40 AM, Ravikant Dindokar <
>> ravikant.iisc@gmail.com> wrote:
>>
>>> Hi Hadoop user,
>>>
>>> I have two map classes processing two different input files. Both map
>>> functions have same key,value format to emit.
>>>
>>> But Reducer called twice for same key , one for value from first map
>>> while one for value from other map.
>>>
>>> I am printing (key ,value) pairs in reducer  :
>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:391:-1#11
>>>
>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:391:3278620528725786624:5352454#-1
>>>
>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:591:3278620528725852160:4194699#-1
>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:591:-1#13
>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:2391:-1#19
>>>
>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:2391:3278620528725917696:5283986#-1
>>>
>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:3291:3278620528725983232:4973087#-1
>>>
>>> both maps emit Longwritable key and Text value.
>>>
>>>
>>> Any idea why this is happening?
>>> Is there any way to get hash values generated by hadoop for keys emitted
>>> by mapper?
>>>
>>> Thanks
>>> Ravikant
>>>
>>
>>
>>
>> --
>> Harshit Mathur
>>
>
>

Mime
View raw message