commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "luccioman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IO-557) UnsupportedEncodingException when opening an ISO-8859-1 XML stream with Turkish as the default locale
Date Mon, 11 Dec 2017 07:32:00 GMT

    [ https://issues.apache.org/jira/browse/IO-557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285599#comment-16285599
] 

luccioman commented on IO-557:
------------------------------

The described issue was detected within [YaCy|https://yacy.net/] Search Engine integration.
You can easily reproduce the issue by running the [GenericXMLParserTest.java
|https://github.com/yacy/yacy_search_server/blob/master/test/java/net/yacy/document/parser/GenericXMLParserTest.java]
JUnit test : running fine with English locale, failing on "testParse" and "testParseISO_8859_1Charset"
tests when locale is set to Turkish (for example with JVM option -Duser.language=tr).

> UnsupportedEncodingException when opening an ISO-8859-1 XML stream with Turkish as the
default locale
> -----------------------------------------------------------------------------------------------------
>
>                 Key: IO-557
>                 URL: https://issues.apache.org/jira/browse/IO-557
>             Project: Commons IO
>          Issue Type: Bug
>          Components: Streams/Writers
>    Affects Versions: 2.6
>         Environment: JVM running with argument -Duser.language=tr, or on an Operating
System with Turkish as its preferred language.
>            Reporter: luccioman
>            Priority: Minor
>              Labels: easyfix
>
> When the default locale is set to the Turkish language, using the XmlStreamReader constructor
on an XML stream with a prolog including the ISO-8859-1 charset name in lowercase as its encoding
throws a UnsupportedEncodingException (java.io.UnsupportedEncodingException: ─░SO-8859-1).
> Example XML prolog : <?xml version="1.0" encoding="iso-8859-1"?>
> This is apparently because the XmlStreamReader class uses String.toUpperCase() in its
getXmlProlog() function. It should rather use toUpperCase(Locale.ROOT) or toUpperCase(Locale.US)
as already done in the getContentTypeEncoding() function. Otherwise the behaviour can be different
depending on the default locale, as the dotted lower case i becomes a dotted upper case i
in the Turkish language, which not the case with other languages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message