hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harshit Mathur <mathursh...@gmail.com>
Subject Re: Reducer called twice for same key
Date Mon, 29 Jun 2015 06:01:17 GMT
Can you share PALReducer also?

On Mon, Jun 29, 2015 at 11:21 AM, Ravikant Dindokar <ravikant.iisc@gmail.com
> wrote:

> Adding source code for more clarity
>
> Problem statement is simple
>
> PartitionFileMapper : it takes input file which has tab separated value V
> , P
> It emits (V, -1#P)
>
> ALFileMapper : It takes input file which has tab separated values V, EL
> It emits (V, E#-1)
>
> in reducer I want to emit
> (V,E#P)
>
> Thanks
> Ravikant
>
> On Mon, Jun 29, 2015 at 11:04 AM, Ravikant Dindokar <
> ravikant.iisc@gmail.com> wrote:
>
>> By custom key, did you meant some class object ? then no.
>>
>> I have two map methods each having different file as input. And both map
>> methods emit *Longwritable key* type. But As in stdout of container file
>> I can see,
>>
>> key & value separated by ':'
>>
>> ./container_1435326857837_0036_01_000102/stdout:Reduce:*391*:-1#11
>> ./container_1435326857837_0036_01_000102/stdout:Reduce:*391*
>> :3278620528725786624:5352454#-1
>>
>> for key 391 reducer is called twice. , one for value from first map while
>> one for value from other map.
>>
>> In map method I parse the string from input file as Long variable and
>> then emit it as LongWritable.
>>
>> Is there something I am missing when I use multipleInput
>> (org.apache.hadoop.mapreduce.lib.input.MultipleInputs)?
>>
>> Thanks
>> Ravikant
>>
>> On Mon, Jun 29, 2015 at 9:22 AM, Harshit Mathur <mathursharp@gmail.com>
>> wrote:
>>
>>> As per Map Reduce, it is not possible that two different reducers will
>>> get same keys.
>>> I think you have created some custom key type? If that is the case then
>>> there should be some issue with the comparator.
>>>
>>> On Mon, Jun 29, 2015 at 12:40 AM, Ravikant Dindokar <
>>> ravikant.iisc@gmail.com> wrote:
>>>
>>>> Hi Hadoop user,
>>>>
>>>> I have two map classes processing two different input files. Both map
>>>> functions have same key,value format to emit.
>>>>
>>>> But Reducer called twice for same key , one for value from first map
>>>> while one for value from other map.
>>>>
>>>> I am printing (key ,value) pairs in reducer  :
>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:391:-1#11
>>>>
>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:391:3278620528725786624:5352454#-1
>>>>
>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:591:3278620528725852160:4194699#-1
>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:591:-1#13
>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:2391:-1#19
>>>>
>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:2391:3278620528725917696:5283986#-1
>>>>
>>>> ./container_1435326857837_0036_01_000102/stdout:Reduce:3291:3278620528725983232:4973087#-1
>>>>
>>>> both maps emit Longwritable key and Text value.
>>>>
>>>>
>>>> Any idea why this is happening?
>>>> Is there any way to get hash values generated by hadoop for keys
>>>> emitted by mapper?
>>>>
>>>> Thanks
>>>> Ravikant
>>>>
>>>
>>>
>>>
>>> --
>>> Harshit Mathur
>>>
>>
>>
>


-- 
Harshit Mathur

Mime
View raw message