cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Udi Weinsberg <...@tochna.technion.ac.il>
Subject Re: Language Support in Request Parameters
Date Fri, 28 Sep 2001 15:29:58 GMT
Digging a bit into the code, I found the following piece (which is copied
from Tomcat, and altered a bit). It's from
src\org\apache\cocoon\environment\wrapper\RequestParameters.java

/**
     * Decode the string
     */
    private String parseName(String s) {
        StringBuffer sb = new StringBuffer();
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            switch (c) {
                case '+':
                    sb.append(' ');
                    break;
                case '%':
                    try {
                        sb.append((char) Integer.parseInt(s.substring(i+1,
i+3),
                              16));
                        i += 2;
                    } catch (NumberFormatException e) {
                        throw new IllegalArgumentException();
                    } catch (StringIndexOutOfBoundsException e) {
                        String rest  = s.substring(i);
                        sb.append(rest);
                        if (rest.length()==2)
                            i++;
                    }

                    break;
                default:
                    sb.append(c);
                    break;
            }
        }
        return sb.toString();
    }


This basically means, that if it finds a %CC encoded char, it simply
translates the CC into it's CHAR equivalent, and appends it to the
resulting string. Isn't this right?? It seeems perfectly right, since the
DB works with the exact same hex values, and the only way to pass chars is
using their BYTE value (0-255). I really don't understand what I am
missing here.

I am using the DBAddAction to add data to the database.  My pipe line is
quite simple and looks like:
AddPatient.xsp (has form) -> DBAddAction (using the request arguments) ->
ShowPatient.xsp (show the details by querying from the DB).

The problem is not in the DB or JDBC driver since I am able to retrieve
data from the db and insert it to a param argument. Perhaps I'll try to
see how this variable is set (as a simple string, in which base?).

An example (regardless of your i18) will be great. Btw, did you documented
the 'iw <-> he' problem I sent you?

If you are listed in the cocoon-dev list, perhaps you can post the problem
there. Let me know!


Thanks,
Udi.


On Fri, 28 Sep 2001, Piroumian, Konstantin wrote:

>  Hi!
> >
> > You said you had the same problem in Oracle. I'm using MySQL with the MM
> > JDBC driver, which expects the normal chars encoding (%xx). Even if I use
> > the serializer encoding (which I am about to try now), it will not do the
> > trick, it will only try to overcome an inherent problem (if it will work
> > at all):
> >
> > The serializer is the output's last pipe's stage, and it seems that the
> > problem is somewhere in the input pipe (meaning from the user to the
> > server and within the server).I can even see that the Log file has
> > incorrect data.
> >
> > An interesting point is that when I retrieve data from the database into a
> > session argument (using the DBAuthAction), the data is inserted correctly
> > - meaning that I see it proper in theLog file and in the resulting page
> > (both in plain hebrew). As I looked into the code I saw that the Action
> > simply queries the db, and puts the result in the session param (am I
> > right, or is there any encoding modification here?). Thus, the results are
> > good, and I see my text as intended.
>
> So, the problem is not in the DB and JDBC driver, is it?
>
> >
> > However, if the data is provided from the user, then the data gets
> > corrupted (reencoded?? where in c2?) - It is the same corrupted data for
> > the database, the resulting html and the log file. So, I guess that the
> > problem is in the translation of the data. Since Tomcat does not translate
> > the data, then (as you said) the problem is with the C2 translation.
>
> Do you use actions to process the user input? Did you try to use
> java.net.URLDecoder.decode() before using params?
>
> >
> > 1. Where is this translation takes place? (which file in the source)
>
> That depends on your pipeline. Maybe there is no translation at all. What is
> your pipeline looks like?
>
> > 1.1 Why do we have this translation?
>
> As far as I remember, this happens, because some servers use 8 byte encoded
> HTTP requests and do not correctly interpret Unicode streams. So, browsers
> URL-encode all characters above 128 ASCII code into %CC form. The same thing
> happens when web server sends the response. Something like that, but I'm not
> sure that this all is correct information. See Tomcat documentation and
> Servlet specification more info.
>
> > 1.2 C2 pipe model cannotuse encoding other than UTF8 ?
>
> I think that it's possible, because either Xerces or Xalan are able to
> process documents in different encodings. But I've never tried it, so I
> can't help you in this point.
>
> > 1.3 If so, how can I handle data that came FROM the database, and put it
> > into the session argument, the html and the log file??
>
> As you said above you don't have problems with it now. Or I get you wrong?
>
> > 2. How did you solve your problem with Oracle? How did you insert proper
> > data to the database? (actualy, the problem is not with the oracle or the
> > driver but in C2 - I guess that this should be posted as a bug, no?)
>
> The problem was with C1 (not C2) and after that we've changed Oracle DB
> encoding to UTF-8 then everything worked as excpeted. But that was about a
> year before now...
>
> > 3. Is there a way I can bypass the translation and give it directly to the
> > DB action? This way, I won't need to make major changes in C2, and
> > everybody will be happy... :-)
>
> I think that you should try to find out where the data is changed. Anyway,
> try to decode the parameter.
>
> I don't think that I can provide much help, because I was away from C2 for a
> long time and seems that I forgot many details. I'll try to provide an i18n
> sample with form data input and simple processing and then I'll be able to
> give you more definite answers.
>
> Konstantin
>
> >
> > Thanks,
> > Udi.
> >
> >
> > On Thu, 27 Sep 2001, Piroumian, Konstantin wrote:
> >
> > > Hi!
> > > See below...
> > >
> > > >
> > > > C2, WNT (hebrew enabled), Tomcat3.2.3, MySQL, IE5.5
> > > >
> > > > Hey!
> > > >
> > > > I'm trying to write an application that uses hebrew in forms (meaning
> that
> > > > the user can insert hebrew chars into form elements, mainly input
> boxes).
> > > > I guess that the problem is the same in any language which is encoded
> into
> > > > special html chars.
> > > >
> > > > I ran a simple application in tomcat (as a simple servlet) and in
> cocoon2,
> > > > which simply takes the data you entered in an input box (in hebrew),
> place
> > > >it into a request parameter and then displays the request parameter
> from a
> > > > different page.
> > > >
> > > > In the post message, I saw that explorer is coding my chars correct:
> > > > POST .....
> > > > host: ...
> > > >
> > > > UserName: %E0%D9%E3....
> > > >
> > > > When I ran the application on a Tomcat servlet - the results were
> good. I
> > > > saw the exact (hebrew) chars that I've written before.
> > > >
> > > > On the C2, however, the parameter did not show up correctly, and was
> coded
> > > > differently.
> > >
> > > Maybe you should try to configure your serializer to use the correct
> > > encoding?
> > > <map:serialize>
> > > <encoding>[HEBREW_ENCODING_NAME]</encoding>
> > > </map:serialize>
> > >
> > > >
> > > > The problem is greater when I try to insert data into MySQL db (which
> > > > expects the normal %XX encoding) and get garbage there as well.
> > >
> > > Is it the same garbage that you see on the screen? We had similar
> problems
> > > with JDBC drivers and Oracle.
> > >
> > > >
> > > >
> > > > Did anyone use C2 in an html-encoded language? Can you tell me what I
> need
> > > > to do to make it work?
> > > > Why is Tomcat working and C2 not? Where is the translation being
> > > > preformed?
> > >
> > > I think, that this happens because C2 uses Unicode (UTF-8) encoding for
> all
> > > internal transformations and Tomcat operated with bytes and does not
> perform
> > > extra encodings needed in C2.
> > >
> > > >
> > > > Thanks,
> > > > Udi.
> > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > Please check that your question has not already been answered in the
> > > > FAQ before posting. <http://xml.apache.org/cocoon/faqs.html>
> > > >
> > > > To unsubscribe, e-mail: <cocoon-users-unsubscribe@xml.apache.org>
> > > > For additional commands, e-mail: <cocoon-users-help@xml.apache.org>
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > Please check that your question has not already been answered in the
> > > FAQ before posting. <http://xml.apache.org/cocoon/faqs.html>
> > >
> > > To unsubscribe, e-mail: <cocoon-users-unsubscribe@xml.apache.org>
> > > For additional commands, e-mail: <cocoon-users-help@xml.apache.org>
> > >
> >
> >
> > ---------------------------------------------------------------------
> > Please check that your question has not already been answered in the
> > FAQ before posting. <http://xml.apache.org/cocoon/faqs.html>
> >
> > To unsubscribe, e-mail: <cocoon-users-unsubscribe@xml.apache.org>
> > For additional commands, e-mail: <cocoon-users-help@xml.apache.org>
> >
>
> ---------------------------------------------------------------------
> Please check that your question has not already been answered in the
> FAQ before posting. <http://xml.apache.org/cocoon/faqs.html>
>
> To unsubscribe, e-mail: <cocoon-users-unsubscribe@xml.apache.org>
> For additional commands, e-mail: <cocoon-users-help@xml.apache.org>
>


---------------------------------------------------------------------
Please check that your question has not already been answered in the
FAQ before posting. <http://xml.apache.org/cocoon/faqs.html>

To unsubscribe, e-mail: <cocoon-users-unsubscribe@xml.apache.org>
For additional commands, e-mail: <cocoon-users-help@xml.apache.org>


Mime
View raw message