Moving discussion to the list.
On 12/1/03 11:33 AM Cédric Chabanois <CChabanois@natsystem.fr> wrote:
> Thanks for your answer.
>
> What I don't understand is your third point : 3. Encode all other chars
> depending on chosen encoding.
>
>
> Why don't we have something like (which does not use UTF-8 or UTF-16) :
> (not optimized, don't know if it even compiles ...)
> "
> public String encode(String xmlString) {
> if(xmlString == null) {
> return "";
> }
> char[] characters = xmlString.toCharArray();
> char character;
> StringBuffer sb = new StringBuffer();
>
> for (int i = 0; i < characters.length; i++) {
> character = characters[i];
> switch (character) {
> // we don't care about single quotes since axis will
> // use double quotes anyway
> case '&':
> out.append(AMP);
> break;
> case '"':
> out.append(QUOTE);
> break;
> case '<':
> out.append(LESS);
> break;
> case '>':
> out.append(GREATER);
> break;
> case '\n':
> out.append(LF);
> break;
> case '\r':
> out.append(CR);
> break;
> case '\t':
> out.append(TAB);
> break;
> default:
> if (character < 0x20) {
> throw new
> IllegalArgumentException(Messages.getMessage("invalidXmlCharacter00",
> Integer.toHexString(character), xmlString));
> } else
> {
> out.append(character);
> }
> break;
> }
> }
>
> return out.toString();
> }
> "
>
> UTF8Encoder and UTF16Encoder would not be needed.
>
> Thanks,
>
> Cédric
>
>
>
>> Hi Cedric,
>>
>> I almost missed your mail within my spam after a few days traveling ;)
>>
>> See comments inline.
>>
>> On 11/28/03 11:46 AM Cédric Chabanois <CChabanois@natsystem.fr> wrote:
>>
>>> Hi,
>>>
>>> I think you sent a patch some months ago concerning encoding.
>>>
>>> out.toString() converted the bytes to a string using the
>> platform's default
>>> charset but the bytes (in UTF-8) were not valid for my
>> default charset.
>>
>> Yes, saw that. Thanks for the fix. As I said on the list I
>> was not able to
>> follow the discussion so far.
>>
>>
>>> However I don't understand the encode method in AbstractXMLEncoder.
>>> Why do you use UTF-8 or UTF-16 there ?
>>> The characters are converted back to a string so whatever
>> the encoding is
>>> (UTF-8, UTF-16 or any other ), the result will be the same.
>>
>> OK. The current AbstractXMLEncoder is just a derived class,
>> and I don't have
>> the original handy. I will try to explain what I tried to
>> achieve with an
>> abstract base class:
>>
>> My proposed patch used the new encoder during soap envelope string
>> serialization only. Axis is using an own xml serialization
>> mechanism. During
>> string serialization I tried to ensure the following:
>>
>> 1. Check for invalid chars (0x00h,...).
>> 2. Encode a bunch of special chars (<,>,&,...) by using standard xml
>> entities.
>> 3. Encode all other chars depending on chosen encoding.
>> 4. Skip as much as possible if encoding isn't required.
>>
>> In my understanding 1 & 2 are encoding independent,
>> especially with the WS-I
>> Profile mandatory encodings UTF-8 and UTF-16 (chars below
>> 0x7fh or 0xffffh
>> remain the same). It might be that my assumption isn't true
>> for 2, however
>> all I do within the abstract base class is to ensure 1 & 2.
>>
>> The getBytes() representation was used to speed up adding a
>> valid xml text
>> representation of the related character. Again, I made the
>> assumption that
>> "&" and ";" are encoding independent.
>>
>> To be honest I don't understand your question "Why do you use UTF-8 or
>> UTF-16 there ?". Could please you rephrase this question?
>>
>> Thanks,
>>
>> Jens
>>
>>
>> cc: dims
>>
|