hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Runping Qi (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-1204) Re-factor InputFormat/RecordReader related classes
Date Wed, 04 Apr 2007 19:49:32 GMT
Re-factor InputFormat/RecordReader related classes

                 Key: HADOOP-1204
                 URL: https://issues.apache.org/jira/browse/HADOOP-1204
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
            Reporter: Runping Qi

This Jira is the first small step to unify the code related to the inputformat/record readers
for streaming 
with the Hadoop main framework.

This Jira does a few things to clean up the related parts in the Hadoop main framework.

1. Add a constructor 
       public LineRecordReader(Configuration job, FileSplit split)
to LineRecordReader. This makes the constructors of both SequenceFileRecordReader and LineRecordReader
have the same signature. This facilitates to have a factory class to create various record
readers when 
we bring in the class readers classes for hadoop streaming to the main framework.

2. Implementded next() method using the following newly added protected method to LineRecordReader

     protected long readLine() throws IOException {
         return LineRecordReader.readLine(in, buffer);

    This allows the user to easily overwrite the readLine logic to use different line breaker
(e.g. treat '\r' as part of data, not line breaker).

3. Rename class InputFormatBase to FileInputFormat to better reflect the functionality of
the class.
To keep backward compatible, still keep InputFormatBase class, but make it deprecated shallow
class simply inheriting FileInputFormat .

4. Change TextInputFormat and SequenceFileFormat to extend FileInputFormat.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message