tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alec Yu" <alec...@msa.hinet.net>
Subject Re: [Proposal] Default Encoding option for JSP/Tomcat in server.xml or web.xml
Date Sat, 12 May 2001 17:24:45 GMT
From: "Craig R. McClanahan" <craigmcc@apache.org>
> Servlet Specification 2.3 (Proposed Final Draft 2), Section 5.4 (p. 44):
> 
>     'The default encoding of a response is "ISO-8859-1"
>     if none has been specified by the servlet programmer.'
I am a servlet programmer also,
why can't I specified it in the container configuration files...*giggle*

> Providing container-level overrides for this would seem to break the spec,
> and any application that depended on that features would not be portable
> to other containers.
Suppose we are developing a web product in JSP, targeting 3 markets
(say, Japan, Taiwan & Korea).

Meanwhile, our product co-operate with some other servlet/JSP-based
product(s) from 3rd party vendors. 

The concern is:
If there is no way to set a default encoding in a web.xml/server.xml or whatever
configuration files used by the servlet/jsp engine, then we have to, not only modify
our code & pages, but also those from other vendors.

More worse, how about those servlets come without source code?

Let's see a real example: (my personal web site)
Sun's Brazil web server acts as the front-end web server (because it's light weight,
responds faster), with my own brazil-to-tomcat connector (invoke servlets/jsp pages
via direct java calls, not via socket connections).

Everything is fine, until Jive (a free forum system in JSP & beans) involves into this
system.
Wow. Jive can't handle Big5, Shift_JIS, GB2312 or anything else like utf-8; only ISO 8859-1
works fine.

Hell, should I modify Jive again and again and again, when Jive updates so often?
How about some new custom Jive skins from somewhere around the world?
How about other 3rd party JSP pages?

The servlet/JSP specifications made me feel that:
they only aimed at L10N problems, not I18N problems.

> > This seems to work at first, as long as you don't treat strings read
> > from GET/POST parameters as Unicode strings, because they are NOT
> > VALID UNICODE STRINGS. Web output generated from servlets/JSP pages
> > may be right, simply because contents in these NOT VALID UNICODE
> > STRINGS are converted into bytes again by simply doing char->byte
> > typecasting.
> For GET requests, there are not very many good solutions because the
> request itself does not include information about the character encoding
> that was used on the requset URI.
Yes, I read something years ago similar to this explaining about why a standard for
determining GET parameters not existing..

> Could you point me specifically to the byte->char/char->byte code that you
> are concerned about?
Hmm......Thank you for lots of explains.
Indeed, what I'm talking about is not broken.
After following the spec more closely, it's ok now.

> You are obviously free to do this kind of special connector, and/or modify
> Tomcat to meet your needs -- but you're also making yourself dependent on
> conventions that are contrary to the servlet and JSP specifications.  Any
> apps you write that depend on this behavior won't run on any other servers
> that implement the standards.  You might want to look at standards based
> alternatives to at least some of the issues that you have raised.
I just feel curious, why the standard specifications cost people here so much maintainance
time,
just because they don't allow us to specify default encodings for compilation time, input
time
and runtime once only in some few configuration files, but force us to specify them in every
pages & every servlet code. Meanwhile, in this manner, as our products co-operate with
those code/pages come from other people, we have to ask their developers:
May you send us a copy of source code/pages?
May you take concern on some character encodings other than your own using one?
May you ......

What an I18 solution looks like this.
Sure,  UTF-8 greatly eased the problems on input & output, but it does not solve
the maintainance problem on other people's code/pages. And, not everybody willing
to take UTF-8 as their default encoding, because only few tools are being able to
edit UTF-8 documents (Let's forget M$ FrontPage, it surely with poor support to JSPs;
Dreamweaver is great, but lack of UTF-8 support; Amaya has poor DBCS support,
not mentioning JSP; even among plain text editors, there are few suppoting UTF-8).

You know, lots of, if not most, JSP pages around the world come with no page contentType directives,
many servlets do not even specify their own character encoding, or do not provide an option
in some
configuration files to do so. The real nightmare is not in our own servlets/pages, but in
other people's.

ps.
I am a newbie, not knowing how to make code submission to Apache projects.
I installed JAMES 1.2.1 on my personal web site, and found it garbaged 8-bit MIME mail headers.
I fixed it, and put SMTP AUTH LOGIN function into its SMTP handler.
(such that, you may put a matcher to allow mail relay by checking accounts, not by IP).
I'd like to contribute such a feature to JAMES. What should I do without join Apache's membership?

Mime
View raw message