logging-log4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ralph Goers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LOG4J2-255) Multi-byte character strings are scrambled in log output
Date Thu, 16 May 2013 06:03:16 GMT

    [ https://issues.apache.org/jira/browse/LOG4J2-255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659259#comment-13659259

Ralph Goers commented on LOG4J2-255:

Nick, while UTF-8 is capable of representing characters in many languages most computers don't
display characters on the screen in Unicode.  They use what IBM calls code pages. For example,
Gary mentioned cp 1252 - cp stands for code page. http://en.wikipedia.org/wiki/Code_page gives
a simple explanation of what they are.  So the problem is that although you may have data
in Unicode, to display it on the screen so that it is viewable it must be converted to the
proper code page.  Since Strings in Java are always UTF-8, when you call getBytes() on the
string passing in a charset allows Java to convert the UTF-8 into the target code page, provided
that the OS has the definition for the code page installed.  This is why Layouts accept a
charset parameter. The charset Java's name for a code page.

What I don't understand here is that if Remko is generating logs in UTF-8 that contain Japanese
characters and is specifying the proper Japanese code page for the host computer why it is
generating unreadable stuff. If no charset is specified then it is perfectly understandable
why this would be happening.

Note that this is actually the proper way to performa internationalization/localization -
the Strings should be manipulated in UTF-8 and passed from client to server that way and only
converted to the target code page when they are actually displayed. 
> Multi-byte character strings are scrambled in log output
> --------------------------------------------------------
>                 Key: LOG4J2-255
>                 URL: https://issues.apache.org/jira/browse/LOG4J2-255
>             Project: Log4j 2
>          Issue Type: Bug
>          Components: Appenders, Core
>    Affects Versions: 2.0-beta6
>            Reporter: Remko Popma
>            Assignee: Remko Popma
>            Priority: Blocker
>             Fix For: 2.0-beta7
> When I tried to log a Japanese string the output was scrambled in both the Console and
a log file.
> For example,
> logger.warn("日本語テスト"); // (Japanese test)
> came out as
> 15:07:00.184 [main] WARN  test.JapaneseTest - 譌・譛ャ隱槭ユ繧ケ繝?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

To unsubscribe, e-mail: log4j-dev-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-dev-help@logging.apache.org

View raw message