hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat
Date Wed, 09 Feb 2011 07:15:57 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992356#comment-12992356
] 

Hudson commented on HADOOP-7096:
--------------------------------

Integrated in Hadoop-Common-trunk-Commit #497 (See [https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/497/])
    

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>            Assignee: Ahmed Radwan
>             Fix For: 0.23.0
>
>         Attachments: HADOOP-7096.patch, HADOOP-7096_r2.patch, HADOOP-7096_r3.patch, hadoop-7096_r4.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes
to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The
current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters.
This is a problem if users have embedded newlines in their data fields (which is pretty common).
This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836
and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom
end-of-record delimiter using a new added configuration property. For backward compatibility,
if this new configuration property is absent, then the same exact previous delimiters are
used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message