hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-7096) Allow setting of end-of-record delimiter for TextInputFormat
Date Mon, 07 Feb 2011 22:14:57 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991654#comment-12991654
] 

Hadoop QA commented on HADOOP-7096:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12470509/HADOOP-7096_r3.patch
  against trunk revision 1066284.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit
warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

    +1 system test framework.  The patch passed system test framework compile.

Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/221//testReport/
Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/221//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/221//console

This message is automatically generated.

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: HADOOP-7096.patch, HADOOP-7096_r2.patch, HADOOP-7096_r3.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes
to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The
current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters.
This is a problem if users have embedded newlines in their data fields (which is pretty common).
This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836
and https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to specify any custom
end-of-record delimiter using a new added configuration property. For backward compatibility,
if this new configuration property is absent, then the same exact previous delimiters are
used (i.e., '\n', '\r' or '\r\n').

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message