hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ahmed Radwan (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat
Date Tue, 11 Jan 2011 01:10:46 GMT
Allow setting of end-of-record delimiter for TextInputFormat
------------------------------------------------------------

                 Key: HADOOP-7096
                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
             Project: Hadoop Common
          Issue Type: Improvement
            Reporter: Ahmed Radwan
         Attachments: 2.patch

The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes
to the LineReader class to allow extensions (see attached 2.patch). Description copied below:

It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current
implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This
is a problem if users have impeded newlines in their data fields (which is pretty common).
This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836
and https://issues.cloudera.org/browse/SQOOP-136).
I have wrote a patch to address this issue. This patch allows users to specify any custom
end-of-record delimiter using a new added configuration property. For backward compatibility,
if this new configuration property is absent, then the same exact previous delimiters are
used (i.e., '\n', '\r' or '\r\n').



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message