hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat
Date Wed, 09 Feb 2011 01:43:58 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Todd Lipcon updated HADOOP-7096:

       Resolution: Fixed
    Fix Version/s: 0.23.0
         Assignee: Ahmed Radwan
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

Committed to trunk, thanks Ahmed!

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>            Assignee: Ahmed Radwan
>             Fix For: 0.23.0
>         Attachments: HADOOP-7096.patch, HADOOP-7096_r2.patch, HADOOP-7096_r3.patch, hadoop-7096_r4.patch
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes
to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The
current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters.
This is a problem if users have embedded newlines in their data fields (which is pretty common).
This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836
and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom
end-of-record delimiter using a new added configuration property. For backward compatibility,
if this new configuration property is absent, then the same exact previous delimiters are
used (i.e., '\n', '\r' or '\r\n').

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message