zookeeper-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Mollitor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-3342) Use StandardCharsets
Date Wed, 26 Jun 2019 22:42:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873690#comment-16873690
] 

David Mollitor commented on ZOOKEEPER-3342:
-------------------------------------------

Java also historically has used the same encoding as the one your presented. Regardless, UTF-8
can capture all UTF-16 values (and then some). Like all things Java, the character encoding
works correctly across platforms.

https://softwareengineering.stackexchange.com/questions/174947/why-does-java-use-utf-16-for-internal-string-representation

I wouldn't recommend making the character encoding a configurable option.

There's no way currently to record the encoding used in all the various places ZK, so if the
default changes between server restarts, reading a snapshot, reading a ZNode name, reading
a ZNode value, etc. may break.
Allowing for a configurable character encoding will explode the test metric for ZK. Using
UTF-8, which covers pretty much every language, will keep the testing in-check.
Since we're changing to UTF-8, which is most permissive, the chance of a backwards capability
issue is very low.

http://utf8everywhere.org/

> Use StandardCharsets
> --------------------
>
>                 Key: ZOOKEEPER-3342
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3342
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.6.0
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> {quote}
> Encodes this String into a sequence of bytes using the platform's default charset, storing
the result into a new byte array. The behavior of this method when this string cannot be encoded
in the default charset is unspecified.
> https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#getBytes--
> {quote}
> Since this is a distributed system, it is always possible that different nodes have different
default charsets defined. I think it's most safe to specify it explicitly across all nodes
for safety sake. You could for example see a situation where an upgrade JVM uses a different
default and during a rolling upgrade of the JVM, different nodes now have a different default.
> * The default charset is usually "ISO-8859-1". UTF-8 covers more of our international
friends.
> * Explicitly specifying the CharSet yields slight performance gains
> * Explicitly specifying the CharSet removes the need for try/catch blocks of UnsupportedEncodingException
> https://blog.codecentric.de/en/2014/04/faster-cleaner-code-since-java-7/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message