accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Setting Charset in getBytes() call.
Date Wed, 31 Oct 2012 00:21:53 GMT
On 10/30/2012 7:47 PM, David Medinets wrote:
>> My issue with this is that you have now hard-coded the fact that everyone else is
going to use UTF-8.
>
> Who is everyone else? I agree that I have hard-coded the use of UTF-8.
> On the other hand, I've merely codified an existing practice. Thus the
> issue is now exposed, the places the convention is used are defined.
> Once a consensus is reached, we can implement it with confidence.

"Everyone else" is everyone who builds Accumulo since you committed your 
changes and uses it. Ignoring that, forcing a single charset isn't the 
big issue here (as we've *all* agreed that UTF-8 should not cause any 
data-correctness issues) so for now I'll just drop it as it's just 
creating confusion.

My issue is *how* you implemented the default charset. We already have 3 
people (Marc, Bill and myself) who have stated that we believe inline 
charset declaration is not the correct implementation and that using the 
JVM property is the better implementation.

I'd encourage others to weigh in to form a complete consensus and shift 
the discussion to that implementation if needed.

>
>> way to fix the problem. I still contest that setting the desired encoding
>> (via the appropriate JVM property like Bill Slacum initial suggested) is the
>> proper way to address the issue.
>
> It is easy to do both. Create a ByteEncodingInitializer (or somesuch)
> class that reads the JVM property and defines a globally used Charset.
> The find those utf8 definitions and usages and replace them with the
> globally-defined value.

Again, by setting the 'file.encoding' JVM parameter, such a class is 
unnecessary because it should be handled internal to Java. For 
Oracle/Sun JDK and OpenJDK, setting the "file.encoding" parameter at run 
time will use the provided charset you wanted without actually changing 
any code.

Mime
View raw message