hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi <ravindra.babu.rav...@gmail.com>
Subject Re: Input file format doubt
Date Thu, 28 Jan 2010 10:33:37 GMT
Thank you Amogh.

On Thu, Jan 28, 2010 at 3:44 PM, Amogh Vasekar <amogh@yahoo-inc.com> wrote:

> Hi,
> For global line numbers, you would need to know the ordering within each
> split generated from the input file. The standard input formats provide
> offsets in splits, so if the records are of equal length you can compute
> some kind of numbering.
> I remember someone had implemented sequential numbering using the partition
> id for each map task (mapred.task.partition) and posted this on his blog. I
> don't have it handy with me right now, but will send you off the list if I
> find it.
> Amogh
> On 1/28/10 3:29 PM, "Udaya Lakshmi" <udaya603@gmail.com> wrote:
> Hi all..
>  I have searched the documentation but could not find a input file
> format which will give line number as the key and line as the value.
> Did I miss something? Can someone give me a clue of how to implement
> one such input file format.
> Thanks,
> Udaya.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message