hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1210) standalone \r is treated as new line by RecordLineReader
Date Wed, 11 Nov 2009 18:56:39 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776573#action_12776573

Owen O'Malley commented on MAPREDUCE-1210:

This is a feature. There are text files out there that only use \r as a line break. In particular
old mac files have that format. The most that can be done is to include a switch to turn it
off, but the switch must default to off for backwards compatibility.

> standalone \r is treated as new line by RecordLineReader
> --------------------------------------------------------
>                 Key: MAPREDUCE-1210
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1210
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Olga Natkovich
> In PIg 0.6.0 we are switching to RecordLineReader from our own implementation. We are
seeing differences in record counts that were traced down to the fact that standalone \r is
treated as line end. I don't think there is any precedence for this and we would like to get
this resolved so that we can use RLR and not break backward compatibility. (This problem was
detected with real user data.)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message