hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Runping Qi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-759) TextInputFormat should allow different treatment on carriage return char '\r'
Date Thu, 30 Nov 2006 15:21:22 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-759?page=comments#action_12454665 ] 
Runping Qi commented on HADOOP-759:

The case at my hand is a bit different. We have a file consisting of a sequence of records,
separated by LF '\n':

And it is possible that some records may contain '\r'. 
Thus, it is wrong to interpret '\r' as a line breaker.

> TextInputFormat should allow different treatment on carriage return char '\r'
> -----------------------------------------------------------------------------
>                 Key: HADOOP-759
>                 URL: http://issues.apache.org/jira/browse/HADOOP-759
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
> The current implementation treat '\r' and '\n' both as line breakers. However, in some
cases, it is desiable to strictly use '\n' as the solely line breaker and treat '\r' as a
part of data in a line. 
> One way to do this is to make readline function as a member function so that the user
can create a subclass to overwrite the function with the desired behavior.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message