accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Setting Charset in getBytes() call.
Date Mon, 29 Oct 2012 16:21:19 GMT
David, I beg to differ.

Setting it via the JVM property is a single change to make, whereas if 
you change every single usage of getBytes(), you now forced the next 
person to branch the code, change everything to UTF16 (hypothetical use 
case) and continue a diverged codebase forever.

I would say that the reason that such a JVM property exists is to 
alleviate you from having to make these code changes in the first place.

On 10/29/2012 12:00 PM, David Medinets wrote:
> I like the idea of making the change explicit in the source code.
> Setting the encoding in the jvm property would be easier but not as
> explicit. I have a few dozen of the files changed. Today I have free
> time since Hurricane Sandy has closed offices.
> On Mon, Oct 29, 2012 at 11:39 AM, William Slacum
> <> wrote:
>> Isn't it easier to just set the JVM property `file.encoding`?
>> On Sun, Oct 28, 2012 at 3:18 PM, Ed Kohlwey <> wrote:
>>> If you use a private static field in each class for the charset, it will
>>> basically be a singleton because charsets are cached in char set.forname.
>>> IMHO this is a somewhat cleaner approach than having lots of static imports
>>> to utility classes with lots of constants in them.
>>> On Oct 28, 2012 5:50 PM, "David Medinets" <>
>>> wrote:
>>>> In this comment, John mentioned that all getBytes() method calls
>>>> should be changed to use UTF8. There are about 1,800 getBytes() calls
>>>> and not all of them involve String objects. I am working on ways to
>>>> identify a subset of these calls to change.
>>>> I have created to
>>>> track this issue.
>>>> Should we create one static Charset object?
>>>>    Class AccumuloDefaultCharset {
>>>>      public static Charset UTF8 = Charset.forName("UTF8");
>>>>    }
>>>> Should we use a static constant?
>>>>    public static String UTF8 = "UTF8";
>>>> I have found one instance of getBytes() in InputFormatBase:
>>>>    protected static byte[] getPassword(Configuration conf) {
>>>>      return Base64.decodeBase64(conf.get(PASSWORD, "").getBytes());
>>>>    }
>>>> Are there any reasons why I can't start specifying the charset? Is
>>>> UTF8 the right Charset to use? I am not an expert in non-English
>>>> charsets, so guidance would be welcome.

View raw message