hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amogh Vasekar <am...@yahoo-inc.com>
Subject Re: Input file format doubt
Date Thu, 28 Jan 2010 10:14:14 GMT
For global line numbers, you would need to know the ordering within each split generated from
the input file. The standard input formats provide offsets in splits, so if the records are
of equal length you can compute some kind of numbering.
I remember someone had implemented sequential numbering using the partition id for each map
task (mapred.task.partition) and posted this on his blog. I don't have it handy with me right
now, but will send you off the list if I find it.


On 1/28/10 3:29 PM, "Udaya Lakshmi" <udaya603@gmail.com> wrote:

Hi all..
  I have searched the documentation but could not find a input file
format which will give line number as the key and line as the value.
Did I miss something? Can someone give me a clue of how to implement
one such input file format.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message