hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: how to assign unique ID (Long Value) in mapper
Date Fri, 26 Jun 2015 12:27:29 GMT
I see 2 issues here which go kind of against the architecture and idea of
M/R (or distributed and parallel programming models.)

1- The map and reduce tasks are suppose to be shared-nothing and
independent tasks. If you add a functionality like this where you need more
sure that some data is unique across all the map or reduce tasks then you
are no longer 'shared nothing' and letting go the advantages of that.

2- Consequence of #1, if you add a common data need between map or reduce
tasks, you are adding a bottleneck which will and can incur performance
issues. On top of that concurrency and race problems.

Having said that, perhaps zookeeper or a coordinating framework like that
could be used to achieve what you want, though I think the issues that I
highlighted above would still be true. It could be a very tricky design.

Just my 2 cents.

Regards,
Shahab

On Fri, Jun 26, 2015 at 5:29 AM, Ravikant Dindokar <ravikant.iisc@gmail.com>
wrote:

> The problem can be thought as assigning line number for each line. Is
> there any inbuilt functionality in hadoop which can do this?
>
> On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar <
> ravikant.iisc@gmail.com> wrote:
>
>> yes , there can be loop in the graph
>>
>> On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <mathursharp@gmail.com>
>> wrote:
>>
>>> Are there loops in your graph?
>>>
>>>
>>> On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <
>>> ravikant.iisc@gmail.com> wrote:
>>>
>>>> Hi Hadoop user,
>>>>
>>>> I have a file containing one line for each edge in the graph with two
>>>> vertex ids (source & sink).
>>>> sample:
>>>> 1    2 (here 1 is source and 2 is sink node for the edge)
>>>> 1    5
>>>> 2    3
>>>> 4    2
>>>> 4    3
>>>> I want to assign a unique Id (Long value )to each edge i.e for each
>>>> line of the file.
>>>>
>>>> How to ensure assignment of unique value in distributed mapper process?
>>>>
>>>> Note : File size is large, so using only one reducer is not feasible.
>>>>
>>>> Thanks
>>>> Ravikant
>>>>
>>>
>>>
>>>
>>> --
>>> Harshit Mathur
>>>
>>
>>
>

Mime
View raw message