accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <bimargul...@gmail.com>
Subject Re: Setting Charset in getBytes() call.
Date Mon, 29 Oct 2012 20:02:36 GMT
On Mon, Oct 29, 2012 at 3:18 PM, John Vines <vines@apache.org> wrote:
> So perhaps we should have ISO-8859-1 as the standard. Mike- do you see any
> reason to use something beside ISO-8859-1 for the encodings?

I object and caution against *any* plan that involves transcoding from
X to UTF-16 and back where when the data is not always going to be
valid bytes of encoding X. The only clean solution here is to have an
API entirely in terms of bytes, and either let the user do getBytes if
they want to store string data, or provide additional API.



>
> John
>
> On Mon, Oct 29, 2012 at 3:14 PM, Michael Flester <flester@gmail.com> wrote:
>
>> > UTF-8 should always be present (according to the JLS), and as a
>> multi-byte
>> > format should be able to encode any character that you would need to.
>> >
>>
>> UTF-8 cannot encode arbitrary data. All data that we store in accumulo
>> is not characters. A safe encoding to use as a pass through when you
>> don't know if you are dealing with characters is ISO-8859-1 since we know
>> that we can make the round trip from bytes to string to bytes without loss.
>>

Mime
View raw message