jakarta-taglibs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Tagunov <tagu...@newmail.ru>
Subject Re[2]: [standard-EA3 comment]: setLocale vs setCharacter encoding
Date Mon, 14 Jan 2002 23:06:32 GMT
Hello Jan!

You as the author of the "..standard.tag.common.fmt.*"
are the _best_ person I could expect to receive a reply from! :-))

And thanks for such an elaborate letter!

>> Is this the correct forum to send comments on
>> Standard Taglib early access 3?
JL> Absolutely!
>> 1) some servlet containers do not set any character
>>    encoding after response.setLocale()
>>    (this is true at least for Tomcat of all versions
>>     and for Weblogic versions 6.0sp1 and 6.1,
>>     for Locale.JAPAN and new Locale("ru","RU"))
>> 2) some servlet containers do set a charcter
>>    encoding after response.setLocale(), but
>>    for example for new Locale("ru","RU")
>>    they set ISO-8859-1, that is not
>>    desirable, as windows-1251 or koi8-r are
>>    in wide practical use, not ISO-8859-1.
>> So what could the solution be?
>> One option is to include the character
>> set name into the resource bundle, under
>> some special key, like
>> javax.servlet.jsp.jstl.i18n.(request.)charset,
>> this will help when the locale is set
>> accordingly to the bundle in automatic mode.
>> Also, the <locale> tag may be given the
>> <locale charset="windows-1251"> attribute.
>> In any case the selected charset could be stored
>> to the
>> javax.servlet.jsp.jstl.i18n.request.charset
>> attribute of the appropriate scope
>> as it is done now.
>> Every time the locale is
>> retrived from
>> javax.servlet.jsp.jstl.i18n.locale
>> attribute and passed to response.setLocale() the value from
>> javax.servlet.jsp.jstl.i18n.(request.)charset
>> could also be passed to the response.setContentType.
>> Unfortunantly this would require setting the contentType
>> also somewhere.. Not swell, but looks quite necessary.

JL> I think the solution you are proposing only addresses the case where
JL> browser-sensing capabilities for locales are disabled by setting the
JL> javax.servlet.jsp.jstl.i18n.locale scoped attribute (either directly
JL> by application code or via the <locale> action) or context
JL> configuration parameter.

Well, autodection case is the most interesting case, and actually I
proposed a partial solution:

if the locale is determined
- by the <bundle> action or by a <message> action, or
- by the <formatNumber> or <formatDate> actions that make use of
  javax.servlet.jsp.jstl.i18n.basename attribute
then we could put the charset name _into_ the bundle, under some
special key, for example like this:

If the locale is determined by the <formatNumber> or <formatDate>
actions in the absense of the javax.servlet.jsp.jstl.i18n.basename
attribute this mechanism won't work of course as no bundle is engaged.

JL> It does not address the case where browser-sensing capabilities are
JL> enabled and the best-matching locale is determined from the client's
JL> preferred and the available locales.

JL> Even in the case where browser-sensing capabilities for locales are
JL> disabled, there is no guarantee that the locale specified in
JL> javax.servlet.jsp.jstl.i18n.locale is the one that's being used. If
JL> it's not available, the JDK falls back on the default locale of the
JL> JSP container's runtime. (While this can be detected in the case of
JL> java.util.ResourceBundle (by calling ResourceBundle.getLocale() and
JL> comparing the returned locale with the desired locale that was
JL> specified in the call to ResourceBundle.getBundle()), no such support
JL> is available for java.text.NumberFormat or java.text.DateFormat.)
Err.. so, if i understand you the right way, the problem is as

1) getAttribute("javax.servlet.jsp.jstl.i18n.locale") returns
   some unsupported locale.
   {e.g. new Locale("ml","MO"), where "ml" stands for "moon
   language" and "MO" stands for "MOON" :) }
2) DateFormat.getDateInstance for such a locale returns the
   default date formatter
3) This <formatDate> tag happens to be not inside any
   <locale> or <bundle> tag and thus has the privilege of
   triggering response.setLocale()

The problem is what locale to pass to response.setLocale()?
I see two possible answeres:
a) still pass new Locale("ml","MO")
b) get to know that the runtime does not support ml-MO
   by checking our getAttribute("javax.servlet.jsp.jstl.i18n.locale")
   returned locale against java.text.DateFormat.getAvailableLocales()
   returned list. This way we could deduct the best fitting locale
   (that is new Locale("ml"), or just the default locale). And we
   can pass this locale to response.setLocale()

AFAI understand, currenlty the a) solution is implemented.

Did I understand correctly what problem you were talking about?

I confess that if the locale is determined by <formatDate> or
<formatNumber> in the absense of the javax.servlet.jsp.jstl.i18n.basename
the approach that i mentioned earlier, putting the charset into the
bundle won't work.

JL> Also, it seems that the mechanism by which an application can set or
JL> override the container's locale-to-charset mapping should be part of
JL> the servlet spec, not JSTL. See my reply to Kazuhiro Kazama's message:

JL>   Kazuhiro Kazama wrote:

JL>   > ii) Some browsers uses an low-quality unicode font to display UTF-8
JL>   > encoded characters.
JL>   > 
JL>   > And thus I would like to propose JSTL support multiple locale/multiple
JL>   > charset model and provide a database function to get a charset by
JL>   > specified locale. For example, Tomcat 4 provides
JL>   > org.apache.catalina.util.CharsetMapper internally for this purpose.
JL>   > 
JL>   > But note that a locale may convined to multiple charsets. For example,
JL>   > "ja" locale is convined to one of "Shift_JIS", "ISO-2022-JP",
JL>   > "EUC-JP", "Windows-31J" etc. Because Shift_JIS has a difference
JL>   > mapping from Windows-31J, we must select one according to a Web
JL>   > application.
JL>   > 
JL>   > Therefore it is a best solution to provide a database function to
JL>   > search a default charset-locale mapping and its override mechanism by
JL>   > a Web application.
JL>   > 
JL>   > For example, in web.xml:
JL>   >     <charset-mapping>
JL>   >         <charset>ISO-8859-1</charset>
JL>   >         <locale>en</locale-type>
JL>   >     </charset-mapping>
JL>   >     <charset-mapping>
JL>   >         <charset>Shift_JIS</charset>
JL>   >         <locale>ja</locale-type>
JL>   >     </charset-mapping>
JL>   > 
JL>   > This proposal may need more discussions in JSR-52, JSR-53 and JSR-154
JL>   > experts and Apache committers.

JL>   I agree JSR-154 might be a more appropriate forum to discuss this. The
JL>   javadocs of the ServletResponse.setLocale() method in Servlet 2.3 says
JL>   that this method should also set the charset, and servlet containers
JL>   use either a public or private mapping between Locale and
JL>   charset. Allowing an application to override this mapping should be
JL>   part of the servlet spec, not JSTL, I think.

Oh, I see that this question has already been rised earlier!
Yes, building the locale-to-charset mapping into the servlet spec
will solve the problem.

Do you know anything about their progress on that?
 Has that been submitted to the Servlet 2.4 EG?
  If yes, are they planning to provide _anything_ for that?
  If not, should anybody submit a letter to jsr-154-comments@jcp.org?

As we should confess, the problem really exists!
- as mentioned earlier, all tomcats and weblogic servers ignore
  the spec in that they do not seem to set charset upon
- as mentioned earlier, e.g. by Kazuhiro Kazama, it is often
  desirable to control the locale2charset mapping

So a question comes, should we wait for the 2.4 to become the reality
or should this functionality be temporarily be built-into
Servlet 2.2/2.3 taglibs as a work-around?

The answer partially depends on when the 2.4 spec is going to be out?
(Anyway, there's commonly a significant lap in time untill the vendors
propose products that support the spec, for IBM, for example it will
take at least another 6 to go 2.4 after the spec comes out,
as they are going to have something servlet 2.3 no earlier then spring
after servlet 2.3 has been approved at septemeber :-( )

I imagine, such hacks may not be wellcome in the standard taglib/JSTL, as
the Servlet 2.4 spec is (hopefully! :-) going to help this, but
maybe the jakarta i18n taglib should still host a workaround for this

Would like to point out specially, that the problem clearly does not
show up with European languages. While the euro currency unites only
the "Core" Europe the omnipresent ISO-8859-1 seems to unite Europe and
both Americas! ;-)

But poor :,) people who have to use Asian languages, cyrillic languages
and so on, so forth, are really feeling very upset :)
We're having plenty of trouble as we are not able to use neither jakarta/i18n,
nor the standard/fmt taglib unless we use the "utf-8" encoding!
And we often do not want to use "utf-8".
Why? Well, I guess that this is
- largely a matter of tradition, people just have got used to serving
  the documents in thier favourite encoding and get a kind of
  satisfacton of being able to continue doin what they did when
  they used to be just young chicken!
- the documents are sometimes smaller in older encodings then
  in "utf-8". F.e. with cyrillics, we use one byte per char
  in older encodings and two bytes per char with utf-8.
  And the bandwidth is still something to save, especially
  in not so well developed countries.. :-(
- Kazuhiro Kazama has given another reason for not using "utf-8"
  (the browsers that he needs to support provide a bad rendering
  for "utf-8" coded documents)
I guess there may be some other reasons for supporting older
charsets, maybe legacy user agents, maybe somthing else..

Pleeease, help us, poor Asian/Slavic/xyz people! :-)
Best regards,
 Anton Tagunov                            mailto:tagunov@newmail.ru

To unsubscribe, e-mail:   <mailto:taglibs-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:taglibs-dev-help@jakarta.apache.org>

View raw message