cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Klimetschek <>
Subject URL decoding in blocks-fw
Date Sun, 04 Feb 2007 22:52:32 GMT
Hi Daniel,

i had some problems with the decoding of the URL in the blocks-fw-impl that I have fixed locally.
As I am still using the state of november/december and did not yet look at the new service/servlet

code, I send you the problem and fixed code snippets via this mail. Maybe this is similar
in your new implementation, but maybe those problems no longer exist. In that case please
disregard this email ;-)

The problem was that URLs with query parameters containing special characters like = or &
that are also used for delimiting parameters inside the query string did not work. E.g. when
sending some 
parameter value that contains a & it was decoded falsely. This was due to the fact that
the URL is decoded in BlockConnection.parseBlockURI() and then reencoded. Decoding was done
for the entire query 
string, before the parameter were split. This exposes the problem that if there is a &
encoded in a parameter, it would now be seen as a parameter delimiter, if one splits the params
afterwards. So in 
theory one encodes like this (that is automatically done by browser or other code i am using):

encode('a') + '=' + encode('value with &') + '&' + encode('b') + '=' + encode('value
with =')

And during decoding you first split up into parameters, then into key and value and then decode
each separately.

The class is not very helpful in this situation, since it builds upon the concept
of encoding or decoding the entire query string. That's why I removed the usage of the URI
constructor to 
re-construct a changed url because they always encode the complete query string. If it is
already encoded (containing %xy) they get encoded again in a completely wrong way. If you
use the decoded 
query string, it might contain decoded & or = inside the parameters that won't get encoded
again. Instead I create the url manually to avoid this URI behaviour. See the code at the

The other changes needed were the creation of the RequestParameter helper class with the non-decoded
raw query string in BlockCallHttpServletRequest:

         this.parameters = new RequestParameters(this.uri.getRawQuery());

and using NetUtils.decode inside the RequestParameter class (the encoding should be taken
from the appropriate cocoon system property):

     public RequestParameters(String queryString) {
         this.names = new HashMap(5);
         if (queryString != null) {
             StringTokenizer st = new StringTokenizer(queryString, "&");
             while (st.hasMoreTokens()) {
                 String pair = st.nextToken();
                 int pos = pair.indexOf('=');
                 if (pos != -1) {
                     try {
                                 NetUtils.decode(pair.substring(0, pos), "utf-8"),
                                 NetUtils.decode(pair.substring(pos+1, pair.length()), "utf-8")
                     } catch (UnsupportedEncodingException e) {
                         throw new IllegalArgumentException(e);

Finally the rewritten BlockConnection.parseBlockURI():

     private URI parseBlockURI(URI uri) throws URISyntaxException {
         // Can't happen
         if (!uri.isAbsolute()) {
             throw new URISyntaxException(uri.toString(),
                                          "Only absolute URIs are allowed for the block protocol.");
         this.logger.debug("BlockSource: resolving " + uri.toString() + " with scheme " +
                 uri.getScheme() + " and ssp " + uri.getRawSchemeSpecificPart());

         URI subURI = new URI(uri.getRawSchemeSpecificPart());

         this.logger.debug("BlockSource: resolved to " + subURI.toString());

         this.blockName = subURI.getScheme();

         // All URIs, also relative are resolved and processed from the block manager
         // FIXME: This will not be a system global id, as the blockName is block local.

         // Manually build the URI because decoding and then recoding the query
         // does not work on the query-string-level: it needs to be done after
         // each parameter pair (eg. 'a=b') has been extracted and split into key
         // and value. Then both parts have to be decoded (they might not only
         // contain umlaute but also the delimiters '=' and '&'). But this is not
         // possible when using the constructors with multiple
         // parameters for each part (eg. scheme, path, query) because the query
         // will always be encoded as entire string. If you pass something that
         // is already encoded, eg. has lots of %xy inside, it will mess up those
         // existing encodings. Thus you can only use the URI(String) constructor
         // for parsing *existing*, thus already encoded, URIs or build the URI
         // by hand like below:
         String ssp = this.blockName + ":" + subURI.getRawPath() + "?" + subURI.getRawQuery();

         // build a new URI that has the previous one only as scheme-specific part
         this.systemId = (new URI(uri.getScheme(), ssp, null)).toString();

         // again, built the URI manually so that the query part does not get mixed up
         return new URI(uri.getScheme() + ":" + subURI.getRawPath() + "?" + subURI.getRawQuery());


Alexander Klimetschek

View raw message