hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Owen O'Malley <omal...@apache.org>
Subject Re: changing SequenceFile format
Date Mon, 13 Sep 2010 20:42:18 GMT

On Sep 13, 2010, at 12:11 PM, Matthew John wrote:

> The terasort input you have implemented is text type. And the input  
> is line
> format where as I am dealing with sequence binary file. For my  
> requirement I
> have created two writable implementables for the key and value  
> respectively

I would just use BytesWritable directly. The reader/writer should  
insist on the fixed lengths, not the types. The only restriction is  
that you can't use the BytesWritable readFields and write methods.  
You'll need to implement them in the file reader and writer.

> I assume I should also implement a inputformat and outputformat  
> along with
> these. But I am not able to figure out how to provide the respective
> filesplit and recordreader/writer.

To implement InputFormat, you'll need to implement getSplits and  
createRecordReader. You'll need to create a RecordReader class that  
understands your file's reader class. Once you implement an  
InputFormat, just set the class as the InputFormat for your job.

-- Owen

View raw message