tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Inamori <...@oop-reserch.com>
Subject URL Decoding for %XX%XX%XX
Date Sun, 24 Jun 2001 21:55:34 GMT
Hello,

Thank you for all your effort and enthusiasm on cool software.

IE (at least IE 5.5 on Windows95) encodes the request URL by the way of:
   one Japanese character --> %XX%XX%XX
And this results in the corrupted URL String.
I'm not sure, but I remenber that Costin pointed this is the bad behavior of IE.
Anyway, to specify the static HTML file with Japanse file name, I realy need to use Japanese
characters in my URL.
If this can be done by Apache HTTP server, it is enough for me.
But, because I cannot find the solution on Apache, I played with Tomcat 3.2.2 to solve my
issue.
I'll report my solution, which is not so sophisticated, but may help someone who encounters
the similar problem.

As you know, the static HTML files are handled by
      org.apache.tomcat.request.StaticInterceptor.
StaticInterceptor uses:
      org.apache.tomcat.util.RequestUtil
to decode the encoded URL, and:
    public final static String URLDecode(String str)
is responsible for this task.
To get the right Java String from %XX%XX%XX, I modified this method.
It looks like this:
    public final static String URLDecode(String str)
	throws NumberFormatException, StringIndexOutOfBoundsException,IllegalArgumentException
    {
	// IE encodes one Japanese character into 3 sequense of
	// %XX, and this results in the corrupted URL.

        if (str == null)  return  null;

	ByteArrayOutputStream baos=new ByteArrayOutputStream();

        int strPos = 0;
        int strLen = str.length();

        while (strPos < strLen) {
            int laPos;        // lookahead position

            // look ahead to next URLencoded metacharacter, if any
            for (laPos = strPos; laPos < strLen; laPos++) {
                char laChar = str.charAt(laPos);
                if ((laChar == '+') || (laChar == '%')) {
                    break;
                }
            }

            // if there were non-metacharacters, copy them all as a block
            if (laPos > strPos) {
		byte[] nonmeta=(str.substring(strPos,laPos)).getBytes();
		baos.write(nonmeta,0,nonmeta.length);
                strPos = laPos;
            }

            // shortcut out of here if we're at the end of the string
            if (strPos >= strLen) {
                break;
            }

            // process next metacharacter
            char metaChar = str.charAt(strPos);
            if (metaChar == '+') {
		baos.write(((int)' '));
                strPos++;
                continue;
            } else if (metaChar == '%') {
		int some=Integer.parseInt(str.substring(strPos + 1, strPos + 3), 16);
                char c = (char)some;
                if(c == '/' || c == '\0')
                    throw new IllegalArgumentException("URL contains encoded special chars.");
		baos.write(some);
                strPos += 3;
            }
        }
	try{
	    baos.flush();
	    baos.close();
	}
	catch(IOException ex){
	}
	String dec=null;
	try{
	    dec=baos.toString("UTF-8");
	}
	catch(UnsupportedEncodingException ex){
	}
	return dec;
    }

In addition, on my Linux box, StaticInterceptor fails to locate the file with Japanese file
name.
I suppose this is not so general problem.
Because the default encoding of JVM does not match the encoding of the file system on my Linux,
I have this issue.
(On my Linux box, the default encoding of JVM is always ISO-8859-1, but the file system is
EUC-JP.)
To solve this problem, I added the following lines into:
    public int requestMap(Request req)
just before going to: absPath = ctx.getRealPath

	try{
	    // Specify the encoding of your file system.
	    pathInfo=new String(pathInfo.getBytes("EUC-JP"));
	}
	catch(UnsupportedEncodingException ex){
	}

Any questions and comments are welcome to me.

Best regards,
-- 

Happy Java programming!

Jun Inamori
OOP-Reserch
E-mail: jun@oop-reserch.com
URL:    http://www.oop-reserch.com/

Mime
View raw message