hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikant Dindokar <ravikant.i...@gmail.com>
Subject Re: Reducer called twice for same key
Date Mon, 29 Jun 2015 06:10:47 GMT
Hi Harshit,

PFA

Thanks
Ravikant

On Mon, Jun 29, 2015 at 11:31 AM, Harshit Mathur <mathursharp@gmail.com>
wrote:

> Can you share PALReducer also?
>
> On Mon, Jun 29, 2015 at 11:21 AM, Ravikant Dindokar <
> ravikant.iisc@gmail.com> wrote:
>
>> Adding source code for more clarity
>>
>> Problem statement is simple
>>
>> PartitionFileMapper : it takes input file which has tab separated value V
>> , P
>> It emits (V, -1#P)
>>
>> ALFileMapper : It takes input file which has tab separated values V, EL
>> It emits (V, E#-1)
>>
>> in reducer I want to emit
>> (V,E#P)
>>
>> Thanks
>> Ravikant
>>
>> On Mon, Jun 29, 2015 at 11:04 AM, Ravikant Dindokar <
>> ravikant.iisc@gmail.com> wrote:
>>
>>> By custom key, did you meant some class object ? then no.
>>>
>>> I have two map methods each having different file as input. And both map
>>> methods emit *Longwritable key* type. But As in stdout of container
>>> file I can see,
>>>
>>> key & value separated by ':'
>>>
>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:*391*:-1#11
>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:*391*
>>> :3278620528725786624:5352454#-1
>>>
>>> for key 391 reducer is called twice. , one for value from first map
>>> while one for value from other map.
>>>
>>> In map method I parse the string from input file as Long variable and
>>> then emit it as LongWritable.
>>>
>>> Is there something I am missing when I use multipleInput
>>> (org.apache.hadoop.mapreduce.lib.input.MultipleInputs)?
>>>
>>> Thanks
>>> Ravikant
>>>
>>> On Mon, Jun 29, 2015 at 9:22 AM, Harshit Mathur <mathursharp@gmail.com>
>>> wrote:
>>>
>>>> As per Map Reduce, it is not possible that two different reducers will
>>>> get same keys.
>>>> I think you have created some custom key type? If that is the case then
>>>> there should be some issue with the comparator.
>>>>
>>>> On Mon, Jun 29, 2015 at 12:40 AM, Ravikant Dindokar <
>>>> ravikant.iisc@gmail.com> wrote:
>>>>
>>>>> Hi Hadoop user,
>>>>>
>>>>> I have two map classes processing two different input files. Both map
>>>>> functions have same key,value format to emit.
>>>>>
>>>>> But Reducer called twice for same key , one for value from first map
>>>>> while one for value from other map.
>>>>>
>>>>> I am printing (key ,value) pairs in reducer  :
>>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:391:-1#11
>>>>>
>>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:391:3278620528725786624:5352454#-1
>>>>>
>>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:591:3278620528725852160:4194699#-1
>>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:591:-1#13
>>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:2391:-1#19
>>>>>
>>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:2391:3278620528725917696:5283986#-1
>>>>>
>>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:3291:3278620528725983232:4973087#-1
>>>>>
>>>>> both maps emit Longwritable key and Text value.
>>>>>
>>>>>
>>>>> Any idea why this is happening?
>>>>> Is there any way to get hash values generated by hadoop for keys
>>>>> emitted by mapper?
>>>>>
>>>>> Thanks
>>>>> Ravikant
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harshit Mathur
>>>>
>>>
>>>
>>
>
>
> --
> Harshit Mathur
>

Mime
View raw message