axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russell Butek" <bu...@us.ibm.com>
Subject Re: [axis] LF vs CR/LF in axis generated code
Date Wed, 05 Jun 2002 14:49:44 GMT
We've got to be clear here.  We're talking about 2 different things.
1.  wire message values
2.  generated code

My statement applies to generated code.

Russell Butek
butek@us.ibm.com


dirkx@covalent.net on 06/04/2002 06:50:44 PM

Please respond to axis-dev@xml.apache.org

To:    axis-dev@xml.apache.org
cc:
Subject:    Re: [axis] LF vs CR/LF in axis generated code




On Tue, 4 Jun 2002, Russell Butek wrote:

> I think we should let the system do it.  Always.  If I've coded "\n" in
the
> WSDL2java code, feel free to slap me around a bit.

The spec tells you to make sure that your implementation send out the
wire  values (in bytes) of

 GET / HTTP/1.0<cr><lf>

i.e. in Hex

 { x47,x45,x54,x20,x2f,x20,x48,x54,x54,
  x50,x2f,x31,x2e,x30,x0d,x0a };

so if you are sure that your

 String j = "GET / HTTP/1.0\r\n"

when written out on the network yields those bytes being send out in
exactly that order; *regardless* of the LOCALE the user may have set or
other properties of the compilers(*), JVM, etc - then you are fine.

Which you are in most java(*) environments.

Otherwise you'll need to do something more fundamental when you convert
down to wire format.

Dw

* This is not as trivial if it sounds; some charset's will
  substitute the lowercase 'l' for a '1' when mapping from
  that to the iso or ascii charset. Likewise some charset
  simply lack certain characters; say the H or the K as they
  are not used in languages - and you may get country or
  language specific substitutions such as to a 'G' or a
  'Kappa' symbol.

  One particular fun area is for example when quering (oracle) databases
  from a locale different than the locale programmed/entered in. And even
  though all is UTF-8/unicode; a 'e'-trema may be receid back as a syngle
  code point -or- as a 'e'-'backspace'-'trema'. Or a cyrillic 'c'
  may end up being another code point; etc. I.e. they all look as glyphs
  the same to the user - i.e. their shape - but their codepoints or
  sequence of bytes interally may differ from what the programmer expected.




Mime
View raw message