hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gelesh (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-9168) The Naming and Inheritance for RecordReader, LineRecordReader, LineReader
Date Wed, 26 Dec 2012 16:12:12 GMT
Gelesh created HADOOP-9168:

             Summary: The Naming and Inheritance for RecordReader, LineRecordReader, LineReader

                 Key: HADOOP-9168
                 URL: https://issues.apache.org/jira/browse/HADOOP-9168
             Project: Hadoop Common
          Issue Type: Improvement
          Components: util
    Affects Versions: 0.23.5, 2.0.2-alpha, 0.21.0
            Reporter: Gelesh
            Priority: Minor
             Fix For: site, hudson, 1.2.0, 0.23.2

I feel LineReader is not the correct name, since it reads up to a given delimiter.

How about Text Record Reader ?
Sounds correct but LineReader is not a RecordReader by inheritance,
but by functionality , yes it is the Record reader.

Now if we look at it with a different angle,

In General,
InputFormat would mostly has two responsibilities
1)To Read A split
2)Generate Key & Value pairs based upon the Reading done over Split.

Now in TextInputFormat,
Has a RecordReader, Which is inherited by LineRecordReader, 
which uses another class LineReader.

But We Have
LineReader, which does the reading of the file.
LineRecordReader generates key & Value. 

I would suggest,

RecordReader      to be renamed as     KeyValueGenerator,
LineRecordReader  to be renamed as     TextInputKeyValueGenerator,
LineReader        to be renamed as     delimitedTextReader,

Generic attributes of LineReader (such as start, pos, end, buffer, bufferBytes .. etc ) to
be abstracted to a class called RecordReader,
Since its all specific to reading of the given input.

delimitedTextReader class could extend RecordReader.

Now the names could make better scene. We must also look into computability as well. It might
be un fit to deploy unless a new API is introduced.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message