hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gabriel balan <gabriel.ba...@oracle.com>
Subject Re: how to assign unique ID (Long Value) in mapper
Date Mon, 29 Jun 2015 19:34:47 GMT
Hi

Rather than trying to figure out the line number of the current line, you can use the byte
offset of the current line.
It's just as unique as the line number, and much easier to obtain: TextInputFormat (FileInputFormat)
uses it as the key.

    Keys are the position in the file, and values are the line of text.

If you have multiple files, you may want to combine the file offset with the file name (path)
to get a unique id. See here how to get the input file name in the mapper <How%20to%20get%20the%20input%20file%20name%20in%20the%20mapper>.

hth
Gabriel Balan

On 6/26/2015 5:29 AM, Ravikant Dindokar wrote:
> The problem can be thought as assigning line number for each line. Is there any inbuilt
functionality in hadoop which can do this?
>
> On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar <ravikant.iisc@gmail.com <mailto:ravikant.iisc@gmail.com>>
wrote:
>
>     yes , there can be loop in the graph
>
>     On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <mathursharp@gmail.com <mailto:mathursharp@gmail.com>>
wrote:
>
>         Are there loops in your graph?
>
>
>         On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <ravikant.iisc@gmail.com
<mailto:ravikant.iisc@gmail.com>> wrote:
>
>             Hi Hadoop user,
>
>             I have a file containing one line for each edge in the graph with two vertex
ids (source & sink).
>             sample:
>             1    2 (here 1 is source and 2 is sink node for the edge)
>             1    5
>             2    3
>             4    2
>             4    3
>             I want to assign a unique Id (Long value )to each edge i.e for each line
of the file.
>
>             How to ensure assignment of unique value in distributed mapper process?
>
>             Note : File size is large, so using only one reducer is not feasible.
>
>             Thanks
>             Ravikant
>
>
>
>
>         -- 
>         Harshit Mathur
>
>
>

-- 
The statements and opinions expressed here are my own and do not necessarily represent those
of Oracle Corporation.


Mime
View raw message