harmony-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oliver Deakin (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HARMONY-6290) BufferedReader.readLine() breaks at EBCDIC newline, violating the spec
Date Wed, 05 Aug 2009 10:33:14 GMT

    [ https://issues.apache.org/jira/browse/HARMONY-6290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739420#action_12739420
] 

Oliver Deakin commented on HARMONY-6290:
----------------------------------------

Hi Jesse/Nathan,

Running the following test:

import java.io.*;
class FileTest {
  public static void main(String[] args) throws Exception {
    BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("test.txt"),
"IBM-1047"));
    System.out.println(br.readLine());
  }
}

against an EBCDIC test.txt file containing:
Hello<NEL>
World<NEL>
<EOF>

on Windows produces the same output from Harmony, IBM and the RI:

Hello

In other words, even when we are not on zOS platforms the RI treats the NEL character as a
newline character when reading a file in EBCDIC. If we remove the encoding specified to InputStreamReader,
so we revert to reading the file in the native encoding for the Windows platform, then again
all 3 jdks have matching behaviour. When we read the EBCDIC file in the native Windows encoding
the NEL hex value (0x15) is not mapped to the unicode NEL character (\u0085) and is just treated
as a normal character.

So it appears that our code currently behaves the same as the RI, even if the spec does not
mention this special case for the EBCDIC character set. I'm not sure if there is any way for
a character to get mapped to the NEL unicode character (\u0085) when we are not working in
EBCDIC, so it may be the case that the code we have right now has the correct logic. 

We could err on the safe side and add an encoding check (check we are reading in EBCDIC and
then check if the character is \u0085) but since we already seem to match the RI behaviour
I'm not sure if that is necessary. What are your thoughts?

Regards,
Oliver

> BufferedReader.readLine() breaks at EBCDIC newline, violating the spec
> ----------------------------------------------------------------------
>
>                 Key: HARMONY-6290
>                 URL: https://issues.apache.org/jira/browse/HARMONY-6290
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>         Environment: SVN Revision: 800827
>            Reporter: Jesse Wilson
>         Attachments: readLine_no_EBCDIC.patch
>
>   Original Estimate: 0.33h
>  Remaining Estimate: 0.33h
>
> The spec says that BufferedReader.readLine() considers only "\r", "\n" and "\r\n" to
be line separators. We must not permit additional separator characters. I admit that the RI's
behaviour is surprising, and incompatible with it's own Pattern and Scanner classes. But this
is the specified behaviour; the doc explicitly calls out which character sequences are used
as newlines. It does not permit additional characters to break lines. 
> For users reading EBCDIC-encoded files, a better practice is to read through the files
using a Scanner. That way, the application will behave the same when executed on either Harmony
or on the RI.
> #Android

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message