hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Creating custom input split.l
Date Sat, 09 Apr 2011 08:18:53 GMT
Hello Ranjith,

On Thu, Apr 7, 2011 at 10:26 AM, ranjith k <ranjith42k@gmail.com> wrote:
> Hello.
> I need to create a custom input split. I need to split my input in to 50
> line for one input split. How can i do it.

Maybe you are looking for the NLineInputFormat? It creates input
splits for every defined N lines.

> And also there is an another problem for me. I have a file. But it is not in
> the form of text. It contain structure. I need to give one structure in to
> my map function as value. And the number of the record is my key. How can i
> achieve this. please help me.

You will need to implement a custom RecordReader for this; basically
you'll have to read your file and structure it to your specs using low
level byte reads off a DFS input stream for your file. Computing the
number of records in the same go may not be possible if the file/split
is too large to be held in the memory, but you may create a
SequenceFile out of this, which has the records count as the key to a
chunk of records as value.

Harsh J

View raw message