hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <ar...@yahoo-inc.com>
Subject Re: Not allow file split
Date Wed, 07 May 2008 15:56:57 GMT

On May 7, 2008, at 6:30 AM, Roberto Zandonati wrote:

> Hi at all, I'm a newbie and I have the following problem.
> I need to implement an InputFormat such that the isSplitable always
> returns false ah shown in http://wiki.apache.org/hadoop/FAQ (question
> no 10).
> And here there is the problem.
> I have also to implement the RecordReader interface for returning the
> whole content of the input file but I don't know how. I have found
> only examples that uses the LineRecordReader

Couple of things.

1. Take a look at SequenceFileRecordReader: http://svn.apache.org/ 

2. If you just want to process a text file as a while or a sequence  
file as whole (or any existing one) you do not need to implement a  
'RecordReader' at all. Just sub-class the InputFormat, override the  
isSplittable and the RecordReader will work correctly. Take a look at  
SortValidtor (http://svn.apache.org/viewvc/hadoop/core/trunk/src/test/ 
org/apache/hadoop/mapred/SortValidator.java) and how it sub-classes  
SequenceFileInputFormat to implement a  


View raw message