commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luc Maisonobe <Luc.Maison...@free.fr>
Subject Re: problems with base64, ascii and php
Date Fri, 29 Aug 2008 10:51:52 GMT
Luc Maisonobe a écrit :
> Nikola Petrović a écrit :
>> Hello all,
>> I have a problem with base64 codec and encoding ascii characters. Here`s the code:
>> int ok=0, bad=0;
>>                 for(int i=128; i<256; i++){
>>                         char c = (char)i;
>>                         str = "" + c;
>>                         String enc = new String(Base64.encodeBase64(str.getBytes()));
>>                         String dec = new String(Base64.decodeBase64(enc.getBytes()));
>>                         if(str.equals(dec)) ok++;
>>                         else bad++;
>>                 }
>>                 System.out.println("ok: " + ok);
>>                 System.out.println("bad: " + bad);
>>
>> I get this:
>> ok: 80
>> bad: 48
>>
>> So, simple encoding and decoding of ascii chars > 128 doesn`t work for me. Is
there some explanation?
> 
> Both the String.getBytes() and the String constructors from bytes rely
> on some charset to convet characters to bytes. I guess your JVM
> configuration has some inconsistencies in the default charset. On my
> Linux box with a default charset set to UTF-8, your code works well.
> 
> For robustness, I suggest you explicitely set the charset as follows:
> 
>   String enc = new String(Base64.encodeBase64(str.getBytes("UTF-8")),
>                           "US-ASCII");
>   String dec = new String(Base64.decodeBase64(enc.getBytes("US-ASCII")),
>                           "UTF-8");

You should also have a look at the str = "" + c statement. I guess it
also depends on the default charset. UTF-8 uses at least two bytes
sequences for special characters, whereas a single byte from 128 to 256
is valid only in some encodings (like ISO-8859-x), with encoding
specific meaning. For example in ISO-8859-1 the character 190 (0xBE) is
the 3/4 character (&frac34; in HTML), but in ISO-8859-15 it is the
capital Y with diaresis (&Yuml; in HTML).

Beware that building a Sting with invalid bytes sequences lead to
undefined results.

Luc

> 
> Luc
> 
> 
>> I tried this on os x, java5 and java6, doesn`t work, and on java6 on solaris pc it
does. That confuses me totally :)
>>
>> The other part of the problem is probably linked to this one. I`m trying to encode
a string in my java code, and decode it in some php code that I use. Is that possible? I guess
since base64 is a standard it should work, and yet it doesn`t on os x, nor on solaris :( 
 Maybe it is linked somehow to the ascii char problem?
>>
>> Thanks,
>> Nikola
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
>> For additional commands, e-mail: user-help@commons.apache.org
>>
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Mime
View raw message