tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: MessageBytes /
Date Tue, 29 Aug 2000 19:59:19 GMT
>  - performance is a good goal but why couldn't it be done behind the
> scenes?  (I think the answer is probably "because String is final"
> right?)

Short answer: Yes.

- byte[] -> String conversion can be delayed. 
- byte[] -> String conversion can be avoided in some cases. Not all
servlets will actually use getHeaders() - most servlets don't do that.
Since all important internal interceptors will use MessageBytes we'll
not generate any string in most cases.
- ASCII conversion can be optimized ( the default converers allocate temp.
buffers that can't be easily reused). Most HTTP stuff we need is ASCII. 

>  - Strings are Unicode; why are they unacceptable for non-ascii
> charsets?  Or, why isn't MessageBytes MessageChars instead?  Then the
> conversion could/would be done at the last minute after all.

The function is to manipulate the received bytes. With String you can't
delay the conversion ( the charset is needed for
String(byte[]) constructor) .

I don't mind calling it MessageChars ( or any other name :-).

>  - MessageBytes seems to have exactly the same API as String; is the
> main (only?) reason to use it because we have direct access to its
> byte array?

See above. 
The API is slightly different - the focus is on converting the byte[] to
String and other types in an optimal way.

The startsWith, etc are result of refactoring - we may remove them, but
were part of the original code and are very useful.

>  - Aren't HTTP headers *required* to be plain-ascii?  Any character
> encoding has to be done inside the body, not the headers, right?  So
> why do you care about supporting non-ascii chars in
> Or do you only use it for performance?

No, acording to HTTP1.1 only the header name is "token" ( i.e. CHAR ==
0..127). The value is TEXT ( any octet except CTLS ) and it uses the same
encoding rules.

( see all the "From: " headers from .jp, .kr, .fr,etc people )

It's also useful to use the same mechanism for headers and URL ( which is
the first received).

For servlet 2.3 the encoding is known only when setEncoding is called (
unless the browser is kind enough ).

> This isn't meant as a challenge -- I trust you had your reasons -- but
> just a request for clarification.

I'm glad someone is reviewing it. It's a very important change.

I couldn't find any better solution for supporting other charsets (
i.e. URL and header content must be delayed until the encoding is known -
either setEncoding or guessed from what browser  sends ).

Performance is a big bonus ( and give reasons to believe this is the right
thing to do ).

Please note that the changes in MimeHeaders have another motivation
too: code readability. The code was fine, but too many methods were
acumulated over time. The fact that nobody touched the code is a good
argument that the code was too complex. 


View raw message