Return-Path: Mailing-List: contact tomcat-dev-help@jakarta.apache.org; run by ezmlm Delivered-To: mailing list tomcat-dev@jakarta.apache.org Received: (qmail 1849 invoked from network); 1 May 2000 19:47:49 -0000 Received: from adsl-63-198-47-229.dsl.snfc21.pacbell.net (HELO costin.dnt.ro) (63.198.47.229) by locus.apache.org with SMTP; 1 May 2000 19:47:49 -0000 Received: from costin.dnt.ro (costin [63.198.47.229]) by costin.dnt.ro (8.9.3+Sun/8.9.1) with ESMTP id MAA00313 for ; Mon, 1 May 2000 12:47:17 -0700 (PDT) Sender: costin@costin.dnt.ro Message-ID: <390DDF5F.4E4A8C2E@costin.dnt.ro> Date: Mon, 01 May 2000 12:47:43 -0700 From: Costin Manolache X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.12-20 i686) X-Accept-Language: en MIME-Version: 1.0 To: tomcat-dev@jakarta.apache.org Subject: Re: Proposal: RequestImpl References: <390CF781.9B878A94@osa.att.ne.jp> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Rating: locus.apache.org 1.6.2 0/1000/N Jun Inamori wrote: > Hello, > > 'HttpServletRequest.getParameter(key)' can't return the correct > parameter string, when the original character sequence contains the 2 > bytes characters, such as Japanese character. > As you know, 2 bytes characters are encoded like: > "%82%B1%82%F1%82%C9%82%BF%82%ED" > The first Japanese character consists of '82' and 'B1' and the second of > '82' and 'F1'. Hi, Thanks for this very good proposal, we do have a lot of problems with character to/from byte conversions and encoding. I have a few comments/questions: - getLocale() Locales are constructed from Accept-Language: header, and if you look at RequestUtil you'll notice the code is very "expensive" - a lot of objects are created, very complex parsing, a new Locale object is allocated ( and that creates few other objects and have a slow init time), etc. I don't think it's a good idea to use it at the engine level - it can be used by servlets, but I would like something a bit faster if it'll be part of the critical loop. I agree that the right way to get the encoding is from Accept-Language: header _and_ Content-Type charset if available ( this is not part of your proposal but I think it have to be used if present !). If Content-Type is not present, I think we need an optimized version of the code to get the JavaEnc, eventually without going through Locale ( i.e. parse only the first component of the header with simple code, and use it directly.) ( Accept-Language is important for the output too, but I agree it's a reasonable guess for input if charset is not specified in Content-Type ). - Decoding using the ByteArrayOS is very expensive in terms of Garbage Collection (GC). GC is right now the main performance problem in tomcat. We will also need to decode if the user will call getReader(). I think we need to find a way to reuse the objects and avoid excessive usage of Strings. ByteArrayOS also creates byte[] buffers -> more GC. One good way to deal with that that it's not covered in your proposal is to use Reader/Writers. I'm still looking for a way to reuse instances of Reader/Writers ( they allocate byte[] buffers too, plus Encoders, Decoders ). Probably a pool of Reader/Writers acting as encoders/decoders might do the trick, or reimplementing the encode/decode in a reusable way. ( XML projects - xalan, crimson - use optimized byte/char converters for common encodings - with little GC and fast execution time). - I know this is a very important issue - and we need to find a good solution, but it's important to do it in a clean way. I can understand what happens if I look at the code, but it's not easy ( I'm talking about tomcat code, not your code ). If we can factor out the encoding/decoding probably everything will be much simpler. - Can you send a DIFF - it's much easier to read and patch ? Costin