accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-836) Specify Charset on getBytes() call for String objects.
Date Wed, 31 Oct 2012 21:59:11 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488278#comment-13488278
] 

Christopher Tubbs commented on ACCUMULO-836:
--------------------------------------------

So, I looked over these changes, and didn't see anything that would be too problematic...
but... I did notice that there are places where we are decoding bytes into a String, using
new String(byte[]), but not specifying the encoding of the byte[]. This causes a discrepancy
in some cases with the corresponding setter that uses .getBytes(utf8). For instance, in InputFormatBase,
we have {code:java}conf.set(PASSWORD, new String(Base64.encodeBase64(passwd)));{code} Aside
from the problematic fact that this Base64 library encodes to a byte[] instead of to a String,
it doesn't document the fact that these bytes are ASCII encoded. If the user's system had
a default encoding that is incompatible with ASCII, this constructor may behave unexpectedly
or throw an exception, as it decodes the ASCII into the Java String type. Reading the password
has no such problem... if the password is "Stringified" into the job configuration without
error, then calling getBytes(utf8) on the ASCII characters in the Java String should not throw
an exception. While this is not likely to cause a problem in the overwhelming majority of
cases, it seems inconsistent to be pedantic about encoding with .getBytes() when we aren't
equally pedantic about decoding with new String(byte[]) and similar.

So, to summarize, I think this is generally on the right path, but needs more focus on both
sides of serialization/deserialization of transient(M/R) / persistent(zoo) state.
                
> Specify Charset on getBytes() call for String objects.
> ------------------------------------------------------
>
>                 Key: ACCUMULO-836
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-836
>             Project: Accumulo
>          Issue Type: Improvement
>    Affects Versions: 1.5.0
>            Reporter: David Medinets
>            Assignee: David Medinets
>            Priority: Minor
>             Fix For: 1.5.0
>
>         Attachments: UTF8.java
>
>
> The comments on ACCUMULO-241 indicate that the build server might have a different default
Charset than computers used by developers. Therefore, some of the tests have different results
on different computers.
> Every getBytes call on a String object should specify the UTF8 Charset. Unfortunately
the codebase has nearly 1,800 getBytes calls.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message