axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davanum Srinivas <d...@yahoo.com>
Subject Re: Axis mis-encodes Strings w/ invalid characters for SOAP transport
Date Tue, 16 Dec 2003 22:39:10 GMT
try axis 1.2 alpha 

--- Ryan Choi <RChoi@salesforce.com> wrote:
> Axis doesn’t seem to be properly XML-encoding string values in SOAP requests/responses.
More
> specifically, org.apache.axis.utils.XmlUtils isn't stripping out invalid characters before
> sending them across the wire. An example of such an invalid string is:
> 
>
2002”N2ŒŽ1“ú³Ž®Œ_–ñ•ª‚æ‚èŬƒ‰ƒCƒZƒ“ƒX”‚ð•ÏX‚¢‚½‚µ‚Ü‚·
> 
> In this case, there is a definite null character, which is not legal XML, being sent
over the
> wire. An Axis client receiving this response chokes in parsing the XML.
> 
> It looks like the problem may be in org.apache.axis.utils.XmlUtils. The xmlEncodeString()
method
> only encodes the string if either '&', '"', '\'', '<' or '>' are found. If
none are found, it
> just returns the original string (even if it has OTHER invalid characters) and writes
it as-is.
> 
> I've included the XmlUtils.xmlEncodeString() method below, as well as a suggested fix
for it.
> 
> I'm using the following:
> 
> Implementation-Title: Apache Axis
> Implementation-Version: 1.1 1021 June 13 2003
> Implementation-Vendor: Apache Web Services
> Java: JDK 1.4.1_02
> OS: Windows XML Professional Version 2002 SP1
> CPU: Intel Xeon 3.06GHz, 1.00 GB RAM
> 
> Any help/suggestions/recommendations would be helpful. Thanks!
> 
> Ryan Choi
> rchoi@salesforce.com
> 
> 
> ----------------------------------------------
> 
> Original from XmlUtils:
> 
>     public static String xmlEncodeString(String orig)
>     {
>         if (orig == null)
>         {
>             return "";
>         }
> 
>         char[] chars = orig.toCharArray();
> 
>         // if the string doesn't have any of the magic characters, leave
>         // it alone.
>         boolean needsEncoding = false;
> 
>         search:
>         for(int i = 0; i < chars.length; i++) {
>             switch(chars[i]) {
>             case '&': case '"': case '\'': case '<': case '>':
>                 needsEncoding = true;
>                 break search;
>             }
>         }
> 
>         if (!needsEncoding) return orig;
> 
>         StringBuffer strBuf = new StringBuffer();
>         for (int i = 0; i < chars.length; i++)
>         {
>             switch (chars[i])
>             {
>             case '&'  : strBuf.append("&amp;");
>                         break;
>             case '\"' : strBuf.append("&quot;");
>                         break;
>             case '\'' : strBuf.append("&apos;");
>                         break;
>             case '<'  : strBuf.append("&lt;");
>                         break;
>             case '\r' : strBuf.append("
");
>                         break;
>             case '>'  : strBuf.append("&gt;");
>                         break;
>             default   : 
>                 if (((int)chars[i]) > 127) {
>                         strBuf.append("&#");
>                         strBuf.append((int)chars[i]);
>                         strBuf.append(";");
>                 } else {
>                         strBuf.append(chars[i]);
>                 }
>             }
>         }
> 
>         return strBuf.toString();
>     }
> 
> Suggested fix for XmlUtils:
> 
>     public static String xmlEncodeString(String orig)
>     {
>         if (orig == null)
>         {
>             return "";
>         }
> 
>         char[] chars = orig.toCharArray();
> 
>         StringBuffer strBuf = new StringBuffer();
>         for (int i = 0; i < chars.length; i++)
>         {
>             switch (chars[i])
>             {
>             case '&'  : strBuf.append("&amp;");
>                         break;
>             case '\"' : strBuf.append("&quot;");
>                         break;
>             case '\'' : strBuf.append("&apos;");
>                         break;
>             case '<'  : strBuf.append("&lt;");
>                         break;
>             case '\r' : strBuf.append("
");
>                         break;
>             case '>'  : strBuf.append("&gt;");
>                         break;
> 		case '\n' : // Line Feed is OK
> 		case '\r' : // Carriage Return is OK
> 		case '\t' : // Tab is OK
> 		// These characters are specifically OK, as exceptions to 
>             // the general rule below:
> 				strBuf.append(chars[i]);
> 				break;
> 		default :
> 			if (((c >= 0x20) && (c <= 0xD7FF)) || 
>                       ((c >= 0xE000) && (c <= 0xFFFD))) {
> 				strBuf.append(chars[i]);
> 			}
> 			// For chars outside these ranges (such as control chars),
> 			// do nothing; it's not legal XML to print these chars,
> 			// even escaped
>             }
>         }
> 
>         return strBuf.toString();
>     }
> 
> 
> 


=====
Davanum Srinivas - http://webservices.apache.org/~dims/

Mime
View raw message