hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dennis Kubes (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-473) TextInputFormat does not correctly handle all line endings
Date Tue, 22 Aug 2006 17:07:13 GMT
TextInputFormat does not correctly handle all line endings
----------------------------------------------------------

                 Key: HADOOP-473
                 URL: http://issues.apache.org/jira/browse/HADOOP-473
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.5.0, 0.6.0
         Environment: All environments
            Reporter: Dennis Kubes
         Attachments: text-input-format.patch

The current TextInputFormat readLine method calls break on either a single '\r' or '\n' character.
 This causes windows formatted text files '\r' '\n' to leave a trailing '\n' character and
the next time the readLine method is called on the same input stream it returns a blank string.
 The patch attached corrects this issue by looking for either single or double character line
endings and positions the input stream to the next line.  It correctly handles windows, mac,
and unix line endings.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message