accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Medinets <david.medin...@gmail.com>
Subject Setting Charset in getBytes() call.
Date Sun, 28 Oct 2012 21:50:23 GMT
https://issues.apache.org/jira/browse/ACCUMULO-241?focusedCommentId=13449680&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13449680

In this comment, John mentioned that all getBytes() method calls
should be changed to use UTF8. There are about 1,800 getBytes() calls
and not all of them involve String objects. I am working on ways to
identify a subset of these calls to change.

I have created https://issues.apache.org/jira/browse/ACCUMULO-836 to
track this issue.

Should we create one static Charset object?

  Class AccumuloDefaultCharset {
    public static Charset UTF8 = Charset.forName("UTF8");
  }

Should we use a static constant?

  public static String UTF8 = "UTF8";

I have found one instance of getBytes() in InputFormatBase:

  protected static byte[] getPassword(Configuration conf) {
    return Base64.decodeBase64(conf.get(PASSWORD, "").getBytes());
  }

Are there any reasons why I can't start specifying the charset? Is
UTF8 the right Charset to use? I am not an expert in non-English
charsets, so guidance would be welcome.

Mime
View raw message