jakarta-taglibs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Tagunov <tagu...@newmail.ru>
Subject Programer-defined Locale->charset mapping (was: Re[2]: standard/fmt or i18n the problem is still there)
Date Wed, 23 Jan 2002 09:10:05 GMT
Hello Tim!

TD> Anton,

TD> I've used WebLogic 6.1 and it seems to do the right conversion, at least for
TD> japanese (SJIS), but that was only after I complained to them. :-)
Do you have some updated version? The one that I have is a trial
version that I downloaded for free a while ago. Mine still does not do it.

TD> As far
TD> as tomcat goes, you're right - it doesn't seem to ever change the charset.

TD> I had also been thinking of putting the charset in the bundle, but also
TD> didn't like it because the bundle is really about programmer-data, and
TD> putting i18n-implementation data in there felt, to use your word, like a
TD> kludge.
Yes, I agree, that's not good to mix i18n and programmer data.

TD> I also agree that this is a servlet spec issue, and the charset mapping idea
TD> for web.xml is a good solution. I've also sent in my comments to the
TD> JSR-154.

TD> The mapping suggested gave me an idea - in the interim with the i18n taglib
TD> we can use a context-param in the web.xml that the locale bundle tags will
TD> look at when calling setLocale() - if one is found for the locale (or just
TD> the locale's language?) it will also call setContentType(). e.g.

TD>   <context-param>
TD>     <param-name>
TD>       org.apache.taglibs.i18n.CharsetMap.ru
TD>     </param-name>
TD>     <param-value>
TD>       ISO-8859-5
TD>     </param-value>
TD>   </context-param>
That's just it, absolutely!
Feels like Cheshire Cat to hear this :-)))

TD> Can you resend the russian properties file?
Here they are, with pleasure! If you want any other wording
please let me know (in english :)
  (The .orig file is encoded with windows-1251, the .properties
  file was obtained from it with native2ascii -encoding windows-1251 xxx yyy)
TD> I'll try that out with this solution.

TD> The charset you'd map to is ISO-8859-5, right?
(The servlet engines that do perform Locale->charset mapping
use ISO-8859-1, not windows-1251 or KOI8-R for russian that's why the
programmer defined mapping is so wellcome!)

TD> I should be able to put this in today if this approach is acceptable.
IMO what has been proposed is highly wellcome.
Still I'm mad enough to propose extra functionality,
see the P.S. bellow.
TD> Tim

>> -----Original Message-----
>> From: tagunov [mailto:tagunov@motor.ru]
>> Sent: Tuesday, January 22, 2002 5:45 AM
>> To: Tim Dawson
>> Subject: standard/fmt or i18n the problem is still there (was: Re: _ja
>> file is ISO2022JP, not SJIS coded)
>> Hello Tim!
>> Glad to hear from you again :-)
>> TD> thanks for the note - I've checked it in and ensured that
>> it works.
>> You mean "works, but not in the way I expected it to work"? ;-)
>> My english-russian dictionary says that a "kludge" is a "piece of
>> code or a program that works despite it shouldn't". ;-))
>> TD> not that it matters to you now that you're hot on the
>> trail of the standard taglib.
>> TD> :-)
>> Well, the problems are still there, but they are no longer yours ;-) !
>> (And I'm not too hot on the standard taglib after all :)
>> And the problem is that it is highly desirable to have a
>> programmer-defined
>>  Locale->charset mapping
>> both because developers often do not like the default one and because
>> all Tomcats and Weblogics do not seem to perform _any_ Locale->charset
>> mapping themselves. (They all use iso-8859-1).
>> Maybe I'll end up taking your (deprecated now :-) or standard/fmt
>> taglib and replacing the calls to response.setLocale(locale) for
>> MyUtil.setLocale(response,locale) where MyUtil will do this
>> programmer-defined Locale->charset mapping.
>> Jan Luehe's opinion on taglibs-dev was that such mapping is more
>> appropriate in the core of the servlet spec. He recommended
>> sending a proposal to jsr-154 (servlet 2.4 EG).
>> I did.
>> But both you and me can see that now, that's servlet 2.3 spec
>> has been out for half a year already, great many people are
>> still using
>> servlet 2.2 and jsp 1.2 soft.
>> So, 2.4 is far away. Even when it comes it will be a long time
>> till it is wideley adopted. Even then some people will still use
>> servlet 2.2 and 2.3 soft.
>> So my opinion is that
>>  1) yes, Locale->charset mapping is most appropriate
>>     in the core of servlet 2.4 spec
>>  2) it is a good idea to implement a temporary
>>     substitute for it
>> But then, standard taglib is an implementation of JSTL
>> forthcoming spec. We can not expect support for
>> such temporary work-around in the spec.
>> Hence, two ways remain:
>>  a) implement it in the going to be deprecated (your :-) i18n taglib
>>  b) let everyone who needs it tailor the taglib on his own
>> (See my P.S. section for a draft of this workaround I'm speaking
>> about)
>> >> -----Original Message-----
>> >> From: tagunov [mailto:tagunov@motor.ru]
>> >> Sent: Thursday, November 29, 2001 2:35 PM
>> >> To: Tim Dawson
>> >> Subject: _ja file is ISO2022JP, not SJIS coded
>> >>
>> >>
>> >> Hello Tim!
>> >>
>> >> I have discovered that the sample bundle
>> >>
>> >> i18n\examples\src\org\apache\taglibs\i18n\i18n-test_ja.properties
>> >>
>> >> contains ISO2022JP coded text, not SJIS coded, that is why
>> >> the <native2ascii encoding="SJIS".. in the build.xml
>> >> japanese.encoding task does not work propelly on this file.
>> Best regards,
>>   Anton Tagunov     mailto:tagunov@motor.ru
>> P.S. What this temporary workaround I'm speaking about could be:
>> --------------------------------------------------------------
>> ---------
>> Jan Luehe's letter contained the following excerpt:
>>   Kazuhiro Kazama wrote:
>>   > ii) Some browsers uses an low-quality unicode font to
>> display UTF-8
>>   > encoded characters.
>>   >
>>   > And thus I would like to propose JSTL support multiple
>> locale/multiple
>>   > charset model and provide a database function to get a charset by
>>   > specified locale. For example, Tomcat 4 provides
>>   > org.apache.catalina.util.CharsetMapper internally for
>> this purpose.
>>   >
>>   > But note that a locale may convined to multiple charsets.
>> For example,
>>   > "ja" locale is convined to one of "Shift_JIS", "ISO-2022-JP",
>>   > "EUC-JP", "Windows-31J" etc. Because Shift_JIS has a difference
>>   > mapping from Windows-31J, we must select one according to a Web
>>   > application.
>>   >
>>   > Therefore it is a best solution to provide a database function to
>>   > search a default charset-locale mapping and its override
>> mechanism by
>>   > a Web application.
>>   >
>>   > For example, in web.xml:
>>   >     <charset-mapping>
>>   >         <charset>ISO-8859-1</charset>
>>   >         <locale>en</locale-type>
>>   >     </charset-mapping>
>>   >     <charset-mapping>
>>   >         <charset>Shift_JIS</charset>
>>   >         <locale>ja</locale-type>
>>   >     </charset-mapping>
>>   >
>>   > This proposal may need more discussions in JSR-52, JSR-53
>> and JSR-154
>>   > experts and Apache committers.
>> The idea is not that bad, and I beleive it could be implemented
>> somewhere in the i18n taglib.
>> My other solution was that the name of charset could be put into the
>> bundle but that is not exactly the same, as in some cases the Locale
>> is determined by the tags that format dates and numbers in the absense
>> of a bundle (at least this is the case with standard/fmt).

Best regards,
 Anton Tagunov                            mailto:tagunov@motor.ru


BTW, i've just got one more idea, how to extend this and make the
charset selection even more dynamic

we define some interface, something like

interface CharsetMapper{
  String getCharset(HttpRequest rec, java.util.Locale loc);

(the HttpRequest is passed to enable examing the request,
session parameters and cookies)

The taglib searches the environment for some specially-named parameter
(search is done in all the scopes: request, session and application)
if an object is found it is cast to CharsetMapper and used.
request.getSession(false) is passed as the first parameter.

To handle reading the mapping data from the web.xml we could go one of
the two following ways:
1) write a special servlet. it will in it's init method read its own


   (or similar context parameters)
   create an object implementing CharsetMapper interface and bind
   it to the application scope.

   This servlet will do nothing else, will have default doGet and
   doPost and won't be bound to any path in the servlet engine.
   (Hope it won't prevent it from being initialized? Then we'll bind
   it to some unused path :-)
2) if no object has been found matching the special name in any scope
   then the taglib code would search for the already described context

A use case for such dynamic charset selection:

  somewhere in the site there's an explicit charset switch:

     enable highly-multilingual pages (use UTF-8)
     optimize for speed               (use national encodings)

  or even a more detailed switch

  Choose charset for the xxx languuage: xxx

  5 years ago many russian sites did this.

  I beleive that was due to incompatibilities in the browsers
  and their failures to support cyrillics propelly. These
  difficulties have been overcome by now and such selectors
  have almost disappeared. Still I can imagine them being
  implemented for some emergency cases.

Your opinions? Is this an overkill?

P.P.S. Maybe if enough people think this to be usefull
       enough we could propose this to jsr-154 too?

To unsubscribe, e-mail:   <mailto:taglibs-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:taglibs-dev-help@jakarta.apache.org>

View raw message