axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davanum Srinivas <d...@apache.org>
Subject Re: don't understand your patch on axis
Date Mon, 01 Dec 2003 17:20:49 GMT
Jens,

+1 to simplify/speed-up the code. we can worry about UTF-32 when we get around to it :) Please
reopen one of the bugs and upload your patch as usual. 

Thanks,
dims

--- Jens Schumann <jens@void.fm> wrote:
> Assuming that your example should illustrate that we (now) use
> java.lang.String UTF-8/UTF-16 encoding anyway I see what you talking about.
> 
> Good point. Which in turn means that my proposed encoder is falling too
> short right now.
> 
> Dims, I hope you read this by accident:
> 
> Assuming we will support UTF-8/UTF-16 encoding only and use a String as
> return type the pluggable Encoder seems to be unnecessary.
> 
> However I believe the current Encoder is way too expensive. We create a lot
> of temporary Strings just for adding them to a StringBuffer or write them to
> a Writer instance.
> If we return byte[] or char[] directly we will avoid the unnecessary String
> objects created at the end of AbstractXMLEncoder#encode, additionally we may
> improve performance a little bit. Should be tested though;)
> 
> On the other hand I think the pluggable approach should remain, since we
> might run into UTF-32 requirements at some point (which I believe isn't
> supported at least on OS X directly)
> 
> 
> What do you think?
> 
> Jens
> 
> 
> 
> 
> 
> On 12/1/03 11:33 AM CÚdric Chabanois <CChabanois@natsystem.fr> wrote:
> 
> > Thanks for your answer.
> > 
> > What I don't understand is your third point : 3. Encode all other chars
> > depending on chosen encoding.
> > 
> > 
> > Why don't we have something like (which does not use UTF-8 or UTF-16) :
> > (not optimized, don't know if it even compiles ...)
> > "
> >   public String encode(String xmlString) {
> >       if(xmlString == null) {
> >           return "";
> >       }
> >       char[] characters = xmlString.toCharArray();
> >       char character;
> >       StringBuffer sb = new StringBuffer();
> > 
> >       for (int i = 0; i < characters.length; i++) {
> >           character = characters[i];
> >           switch (character) {
> >               // we don't care about single quotes since axis will
> >               // use double quotes anyway
> >               case '&':
> >                   out.append(AMP);
> >                   break;
> >               case '"':
> >                   out.append(QUOTE);
> >                   break;
> >               case '<':
> >                   out.append(LESS);
> >                   break;
> >               case '>':
> >                   out.append(GREATER);
> >                   break;
> >               case '\n':
> >                   out.append(LF);
> >                   break;
> >               case '\r':
> >                   out.append(CR);
> >                   break;
> >               case '\t':
> >                   out.append(TAB);
> >                   break;
> >               default:
> >                   if (character < 0x20) {
> >                       throw new
> > IllegalArgumentException(Messages.getMessage("invalidXmlCharacter00",
> > Integer.toHexString(character), xmlString));
> >                  } else
> >                  {
> >                     out.append(character);
> >                  }
> >                  break;
> >           }
> >       }
> > 
> >       return out.toString();
> >   }
> > "
> > 
> > UTF8Encoder and UTF16Encoder would not be needed.
> > 
> > Thanks, 
> > 
> > CÚdric
> > 
> > 
> > 
> >> Hi Cedric,
> >> 
> >> I almost missed your mail within my spam after a few days traveling ;)
> >> 
> >> See comments inline.
> >> 
> >> On 11/28/03 11:46 AM CÚdric Chabanois <CChabanois@natsystem.fr> wrote:
> >> 
> >>> Hi,
> >>> 
> >>> I think you sent a patch some months ago concerning encoding.
> >>> 
> >>> out.toString() converted the bytes to a string using the
> >> platform's default
> >>> charset but the bytes (in UTF-8) were not valid for my
> >> default charset.
> >> 
> >> Yes, saw that. Thanks for the fix. As I said on the list I
> >> was not able to
> >> follow the discussion so far.
> >> 
> >> 
> >>> However I don't understand the encode method in AbstractXMLEncoder.
> >>> Why do you use UTF-8 or UTF-16 there ?
> >>> The characters are converted back to a string so whatever
> >> the encoding is
> >>> (UTF-8, UTF-16 or any other ), the result will be the same.
> >> 
> >> OK. The current AbstractXMLEncoder is just a derived class,
> >> and I don't have
> >> the original handy. I will try to explain what I tried to
> >> achieve with an
> >> abstract base class:
> >> 
> >> My proposed patch used the new encoder during soap envelope string
> >> serialization only. Axis is using an own xml serialization
> >> mechanism. During
> >> string serialization I tried to ensure the following:
> >> 
> >> 1. Check for invalid chars (0x00h,...).
> >> 2. Encode a bunch of special chars (<,>,&,...) by using standard xml
> >> entities.
> >> 3. Encode all other chars depending on chosen encoding.
> >> 4. Skip as much as possible if encoding isn't required.
> >> 
> >> In my understanding 1 & 2 are encoding independent,
> >> especially with the WS-I
> >> Profile mandatory encodings UTF-8 and UTF-16 (chars below
> >> 0x7fh or 0xffffh
> >> remain the same). It might be that my assumption isn't true
> >> for 2, however
> >> all I do within the abstract base class is to ensure 1 & 2.
> >> 
> >> The getBytes() representation was used to speed up adding a
> >> valid xml text
> >> representation of the related character. Again, I made the
> >> assumption that
> >> "&" and ";" are encoding independent.
> >> 
> >> To be honest I don't understand your question "Why do you use UTF-8 or
> >> UTF-16 there ?". Could please you rephrase this question?
> >> 
> >> Thanks,
> >> 
> >> Jens   
> >> 
> >> 
> >> cc: dims
> >> 
> 


=====
Davanum Srinivas - http://webservices.apache.org/~dims/

Mime
View raw message