hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: Fileformat query
Date Fri, 29 Jan 2010 01:54:28 GMT
Sorry for my mistake, the idea of writing your own InputFormat seems not a
good idea. The cost of getting the line number of each split is a little
high.



On Fri, Jan 29, 2010 at 8:40 AM, Jeff Zhang <zjffdu@gmail.com> wrote:

> I'm afraid you have to write your own InputFormat if you really want to
> make the line number as the key.
> And I believe you can reuse most of the code of TextInputFormat, since your
> InputFormat is almost the same as TextInputFormat except the key.
>
>
>
>
> On Thu, Jan 28, 2010 at 7:35 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:
>
>> On Thu, Jan 28, 2010 at 4:01 AM, Udaya Lakshmi <udaya603@gmail.com>
>> wrote:
>> > Hi all..
>> >   I have searched the documentation but could not find a input file
>> > format which will give line number as the key and line as the value.
>> > Did I miss something? Can someone give me a clue of how to implement
>> > one such input file format.
>> >
>> > Thanks,
>> > Udaya.
>> >
>>
>>
>> Udaya,
>>
>> When using the standard File Input Format:
>>
>> public void map(LongWritable key, Text value, OutputCollector<Text,
>> IntWritable> output, Reporter reporter) throws IOException {
>>
>> key represents the byte offset of the key in the input file. There is
>> no easy way for translate the byte offset to a logical line number,
>> unless all lines were fixed width (not usually the case)
>>
>> Edward
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Best Regards

Jeff Zhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message