tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eugen Kuleshov <a...@hco.kollegienet.dk>
Subject Re: Proposal: RequestImpl
Date Tue, 02 May 2000 00:55:48 GMT
Costin Manolache wrote:
 
> > > We need to implement getReader() anyway - can't get around that.
> > > We also need to at least respect the encoding if it is specified as part of
> > > the
> > > POST method - getParameters() must use the right encoding if specified.
> >
> >   But it can be wrong in some cases. Anyway getParameters() for POST and
> > for GET too should use encoding setted by servlet developer
> > (request.setCharacterEncoding( String enc) ).
> 
> > > Unfortunately there is not .setChareacterEncoding method in SerlvetRequest,
> >
> >   but we still hope this will be added in JSDK 2.3
> 
> It would be great - but right now we're in 2.2 world and people are filling bugs.
> 
> Some situations are clear and we can resolve them in 2.2 - if the charset
> is specified in the Content-Type we need to use it.
> 
> I don't know if the user-specified encoding can have more priority than
> browser-specified encoding.
> 
> > > or getReader( encoding ) - so there is little we can do about that.
> >
> >   this is not necessary. It should use encoding from
> > .setChareacterEncoding
> 
> If the user is calling the method.

  Of course if he not calling it. Container should use approximated
encoding.
 
> BTW, how is the user supposed to guess the charset encoding ?
> There is no way to know what browser on what platform is used
> to access the resource.
> 
> If we don't know how to extract the encoding from headers how is
> the servlet developer supposed to do that ?

  For example look at this http://apache.lexa.ru/english/internals.html
  This is descriptions of Russian Apache project (mod_charset for
Apache).
 
> > > If we find a good way to "guess" ( even if it's complex - as long as we can
> > > keep it modular ) - I see no reason not to implement it. HTTP is supposed
> > > to be international, and in time ( I hope ) the browsers will have fewer
> > > bugs ( and use UTF ? ), eliminating the complex encoding code.
> > >
> > > I find this a very difficult problem, and I spent some time on this - asking
> > > servlet/JSP developers to deal with charsets  will not be easy.
> >
> >   It should be solved in JSDK but not in reference inmplementation.
> 
> ???

  Just look ar code in javax\servlet\http\HttpUtils.java
  
  In parse post parsePostData method:

  String postedBody = new String(postedBytes, 0, len, "8859_1");

  Is it right? I dont' think so. But Tomkat 3.1 use this code for
parsing request parameters.
 
> In some cases everything is clear, and we need to resolve those cases anyway.
> Char to byte and reverse will have to be done, the only question is how to
> detect the encoding. With or without user-suplied encoding - we need
> to deal with that.

  If notwhere in request (for example in post method) encoding not
setted. Container should use JVM encoding setted by webmaster (or maybe
from JVM property file.encoding).
  But don't forget about GET method and decoding characters from %xx
with using correct encoding. Because for all 8bit encodings it will
produce absolutely different results.
 
> The HTTP spec is also clear - and some browsers are implementing it.
> We need to observe the charset encoding in the HTTP protocol - if specified,
> and it's a bug if we don't.
> 
> Guessing - it's another story, probably you're right about that.

  Eugen Kuleshov.

Mime
View raw message