commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lantao Jin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IO-335) Tailer#readLines - incorrect CR handling
Date Wed, 25 Sep 2013 07:07:02 GMT

    [ https://issues.apache.org/jira/browse/IO-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777214#comment-13777214
] 

Lantao Jin commented on IO-335:
-------------------------------

I found tailer still erroneously considers character CR(\r) as line terminator in version
2.4.
The issue which I will describe is under Linux. When Tailer#readLines receive a character
sequence like this "aa\rbb\n", it will be divided into 2 lines: (It is not what I expect)
aa
bb
However, Linux system use the ASCII character \n(LF) as the newline character, not CR. Wiki
about newline(http://en.wikipedia.org/wiki/Newline) also gives some correspondences between
OS and line terminator.
We can see that CR is just used as newline character in Mac OS etc.

One not good solution for the issue is considering it with OS environment. We can keep OS
condition in Tailer initial (http://www.ziben.com.br/java/java-os-name-property-values). But
I know it is not a good way: The logs which record a Windows application data are coped by
 Tailer in Linux.

Anyway, current code causes a problom by CR in Linux. 
                
> Tailer#readLines - incorrect CR handling
> ----------------------------------------
>
>                 Key: IO-335
>                 URL: https://issues.apache.org/jira/browse/IO-335
>             Project: Commons IO
>          Issue Type: Bug
>            Reporter: Sebb
>            Assignee: Sebb
>             Fix For: 2.4
>
>
> The readLines method checks for CR. If found, it is not stored immediately, but a flag
is set.
> If the next char is an LF, the buffer is passed to the listener without the CR.
> As soon as the next non-LF (and non-CR) character is received, the saved CR is written
to the buffer.
> The net result is that CR before LF migrates to the start of the next non-empty line,
and repeated CRs are collapsed. This is clearly wrong.
> The original code (before IO-274) used RandomAccessFile#readLine() which returns on CR,
LF or CRLF.
> It looks as though the intention was to retain this behaviour whilst not blocking.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message