hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tarandeep Singh <tarand...@gmail.com>
Subject Re: hwo to read a text file in Map function until reaching specific line
Date Fri, 26 Jun 2009 18:30:23 GMT
The TextInputFormat gives byte offset in the file as key and the entire line
as value. so it won't work for you.

You can modify NLineInputFormat to achieve what you want. NLineInputformat
gives each mapper N Lines (in your case N=500)

Since you are interested in only first 500 lines of each file, the record
reader for NLineInputFormat will be implemented as-

get the input split
check the start pos
if start pos ==0
  you will read the first 500 lines
  you have got a file split that is in middle of the file, don't bother to
read anything as the mapper that is reading from the beginning of the file
is reading first 500 lines. Just indicate no more input.


On Fri, Jun 26, 2009 at 10:35 AM, Ramakishore Yelamanchilli <
kyelaman@cisco.com> wrote:

> I think map function gets the line number as key. You can ignore te other
> lines after the key value 500.
> Thanks
> -----Original Message-----
> From: Leiz [mailto:lzhang32@gmail.com]
> Sent: Friday, June 26, 2009 8:57 AM
> To: core-user@hadoop.apache.org
> Subject: hwo to read a text file in Map function until reaching specific
> line
> For example , I have a text file with 1000 lines.
> I only want to read the first 500 line of the file.
> How can I do in Map function?
> Thanks
> --
> View this message in context:
> http://www.nabble.com/hwo-to-read-a-text-file-in-Map-function-until-reaching
> -specific-line-tp24222783p24222783.html<http://www.nabble.com/hwo-to-read-a-text-file-in-Map-function-until-reaching%0A-specific-line-tp24222783p24222783.html>
> Sent from the Hadoop core-user mailing list archive at Nabble.com.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message