hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ahmed Radwan (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat
Date Tue, 11 Jan 2011 01:10:46 GMT
Allow setting of end-of-record delimiter for TextInputFormat

                 Key: HADOOP-7096
                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
             Project: Hadoop Common
          Issue Type: Improvement
            Reporter: Ahmed Radwan
         Attachments: 2.patch

The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes
to the LineReader class to allow extensions (see attached 2.patch). Description copied below:

It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current
implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This
is a problem if users have impeded newlines in their data fields (which is pretty common).
This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836
and https://issues.cloudera.org/browse/SQOOP-136).
I have wrote a patch to address this issue. This patch allows users to specify any custom
end-of-record delimiter using a new added configuration property. For backward compatibility,
if this new configuration property is absent, then the same exact previous delimiters are
used (i.e., '\n', '\r' or '\r\n').

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message