axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Schumann <j...@void.fm>
Subject Re: don't understand your patch on axis
Date Mon, 01 Dec 2003 17:06:34 GMT
Moving discussion to the list.



On 12/1/03 11:33 AM Cédric Chabanois <CChabanois@natsystem.fr> wrote:

> Thanks for your answer.
> 
> What I don't understand is your third point : 3. Encode all other chars
> depending on chosen encoding.
> 
> 
> Why don't we have something like (which does not use UTF-8 or UTF-16) :
> (not optimized, don't know if it even compiles ...)
> "
>   public String encode(String xmlString) {
>       if(xmlString == null) {
>           return "";
>       }
>       char[] characters = xmlString.toCharArray();
>       char character;
>       StringBuffer sb = new StringBuffer();
> 
>       for (int i = 0; i < characters.length; i++) {
>           character = characters[i];
>           switch (character) {
>               // we don't care about single quotes since axis will
>               // use double quotes anyway
>               case '&':
>                   out.append(AMP);
>                   break;
>               case '"':
>                   out.append(QUOTE);
>                   break;
>               case '<':
>                   out.append(LESS);
>                   break;
>               case '>':
>                   out.append(GREATER);
>                   break;
>               case '\n':
>                   out.append(LF);
>                   break;
>               case '\r':
>                   out.append(CR);
>                   break;
>               case '\t':
>                   out.append(TAB);
>                   break;
>               default:
>                   if (character < 0x20) {
>                       throw new
> IllegalArgumentException(Messages.getMessage("invalidXmlCharacter00",
> Integer.toHexString(character), xmlString));
>                  } else
>                  {
>                     out.append(character);
>                  }
>                  break;
>           }
>       }
> 
>       return out.toString();
>   }
> "
> 
> UTF8Encoder and UTF16Encoder would not be needed.
> 
> Thanks, 
> 
> Cédric
> 
> 
> 
>> Hi Cedric,
>> 
>> I almost missed your mail within my spam after a few days traveling ;)
>> 
>> See comments inline.
>> 
>> On 11/28/03 11:46 AM Cédric Chabanois <CChabanois@natsystem.fr> wrote:
>> 
>>> Hi,
>>> 
>>> I think you sent a patch some months ago concerning encoding.
>>> 
>>> out.toString() converted the bytes to a string using the
>> platform's default
>>> charset but the bytes (in UTF-8) were not valid for my
>> default charset.
>> 
>> Yes, saw that. Thanks for the fix. As I said on the list I
>> was not able to
>> follow the discussion so far.
>> 
>> 
>>> However I don't understand the encode method in AbstractXMLEncoder.
>>> Why do you use UTF-8 or UTF-16 there ?
>>> The characters are converted back to a string so whatever
>> the encoding is
>>> (UTF-8, UTF-16 or any other ), the result will be the same.
>> 
>> OK. The current AbstractXMLEncoder is just a derived class,
>> and I don't have
>> the original handy. I will try to explain what I tried to
>> achieve with an
>> abstract base class:
>> 
>> My proposed patch used the new encoder during soap envelope string
>> serialization only. Axis is using an own xml serialization
>> mechanism. During
>> string serialization I tried to ensure the following:
>> 
>> 1. Check for invalid chars (0x00h,...).
>> 2. Encode a bunch of special chars (<,>,&,...) by using standard xml
>> entities.
>> 3. Encode all other chars depending on chosen encoding.
>> 4. Skip as much as possible if encoding isn't required.
>> 
>> In my understanding 1 & 2 are encoding independent,
>> especially with the WS-I
>> Profile mandatory encodings UTF-8 and UTF-16 (chars below
>> 0x7fh or 0xffffh
>> remain the same). It might be that my assumption isn't true
>> for 2, however
>> all I do within the abstract base class is to ensure 1 & 2.
>> 
>> The getBytes() representation was used to speed up adding a
>> valid xml text
>> representation of the related character. Again, I made the
>> assumption that
>> "&" and ";" are encoding independent.
>> 
>> To be honest I don't understand your question "Why do you use UTF-8 or
>> UTF-16 there ?". Could please you rephrase this question?
>> 
>> Thanks,
>> 
>> Jens   
>> 
>> 
>> cc: dims
>> 


Mime
View raw message