jakarta-taglibs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Dawson" <tdawso...@yahoo.com>
Subject RE: Programer-defined Locale->charset mapping (was: Re[2]: standard/fmt or i18n the problem is still there)
Date Tue, 22 Jan 2002 22:05:19 GMT
> BTW, i've just got one more idea, how to extend this and make the
> charset selection even more dynamic
>
> we define some interface, something like
>
> interface CharsetMapper{
>   String getCharset(HttpRequest rec, java.util.Locale loc);
> }

I generally like pluggable interfaces, and we could do that, but I don't see
many other taglibs using this kind of pattern. A pattern that I have seen a
lot, and would also give you more dynamic selection, is to search for
"org.apache.taglibs.i18n.CharsetMap.<locale>" in the (page, request,
session, application) hierarchy, then default to whatever is defined in the
Servlet Context if nothing else is available. (incidentally, I got this
approach from Jan; this is how the standard taglib is planning to do a
number of things)

This way you could use an init servlet to load the map from somewhere, say,
a relational database, into the application scope if the
ServletContext/web.xml approach is too static.

You could also provide user/session-based overrides, to allow

>   or even a more detailed switch
>
>   Choose charset for the xxx languuage: xxx
>                                         yyy
>                                         zzz

And even be able to have a JSP set something at the page/request level for

>   somewhere in the site there's an explicit charset switch:
>
>      enable highly-multilingual pages (use UTF-8)
>      optimize for speed               (use national encodings)
>

Would this work?

Tim


> -----Original Message-----
> From: tagunov [mailto:tagunov@motor.ru]
> Sent: Tuesday, January 22, 2002 2:56 PM
> To: Tim Dawson
> Cc: taglib-dev@jakarta.apache.org
> Subject: Programer-defined Locale->charset mapping (was: Re[2]:
> standard/fmt or i18n the problem is still there)
>
>
> Hello Tim!
>
> TD> Anton,
>
> TD> I've used WebLogic 6.1 and it seems to do the right
> conversion, at least for
> TD> japanese (SJIS), but that was only after I complained to them. :-)
> Do you have some updated version? The one that I have is a trial
> version that I downloaded for free a while ago. Mine still
> does not do it.
>
> TD> As far
> TD> as tomcat goes, you're right - it doesn't seem to ever
> change the charset.
>
> TD> I had also been thinking of putting the charset in the
> bundle, but also
> TD> didn't like it because the bundle is really about
> programmer-data, and
> TD> putting i18n-implementation data in there felt, to use
> your word, like a
> TD> kludge.
> :-)
> Yes, I agree, that's not good to mix i18n and programmer data.
>
> TD> I also agree that this is a servlet spec issue, and the
> charset mapping idea
> TD> for web.xml is a good solution. I've also sent in my
> comments to the
> TD> JSR-154.
>
> TD> The mapping suggested gave me an idea - in the interim
> with the i18n taglib
> TD> we can use a context-param in the web.xml that the locale
> bundle tags will
> TD> look at when calling setLocale() - if one is found for
> the locale (or just
> TD> the locale's language?) it will also call setContentType(). e.g.
>
> TD>   <context-param>
> TD>     <param-name>
> TD>       org.apache.taglibs.i18n.CharsetMap.ru
> TD>     </param-name>
> TD>     <param-value>
> TD>       ISO-8859-5
> TD>     </param-value>
> TD>   </context-param>
> That's just it, absolutely!
> Feels like Cheshire Cat to hear this :-)))
>
> TD> Can you resend the russian properties file?
> Here they are, with pleasure! If you want any other wording
> please let me know (in english :)
>   (The .orig file is encoded with windows-1251, the .properties
>   file was obtained from it with native2ascii -encoding
> windows-1251 xxx yyy)
> TD> I'll try that out with this solution.
> :-)))
>
> TD> The charset you'd map to is ISO-8859-5, right?
> windows-1251
> (The servlet engines that do perform Locale->charset mapping
> use ISO-8859-1, not windows-1251 or KOI8-R for russian that's why the
> programmer defined mapping is so wellcome!)
>
> TD> I should be able to put this in today if this approach is
> acceptable.
> IMO what has been proposed is highly wellcome.
> Still I'm mad enough to propose extra functionality,
> see the P.S. bellow.
> TD> Tim
>
> >> -----Original Message-----
> >> From: tagunov [mailto:tagunov@motor.ru]
> >> Sent: Tuesday, January 22, 2002 5:45 AM
> >> To: Tim Dawson
> >> Subject: standard/fmt or i18n the problem is still there
> (was: Re: _ja
> >> file is ISO2022JP, not SJIS coded)
> >>
> >>
> >> Hello Tim!
> >> Glad to hear from you again :-)
> >>
> >> TD> thanks for the note - I've checked it in and ensured that
> >> it works.
> >> You mean "works, but not in the way I expected it to work"? ;-)
> >> My english-russian dictionary says that a "kludge" is a "piece of
> >> code or a program that works despite it shouldn't". ;-))
> >> TD> not that it matters to you now that you're hot on the
> >> trail of the standard taglib.
> >> TD> :-)
> >> Well, the problems are still there, but they are no longer
> yours ;-) !
> >> (And I'm not too hot on the standard taglib after all :)
> >>
> >> And the problem is that it is highly desirable to have a
> >> programmer-defined
> >>  Locale->charset mapping
> >> both because developers often do not like the default one
> and because
> >> all Tomcats and Weblogics do not seem to perform _any_
> Locale->charset
> >> mapping themselves. (They all use iso-8859-1).
> >>
> >> Maybe I'll end up taking your (deprecated now :-) or standard/fmt
> >> taglib and replacing the calls to response.setLocale(locale) for
> >> MyUtil.setLocale(response,locale) where MyUtil will do this
> >> programmer-defined Locale->charset mapping.
> >>
> >> Jan Luehe's opinion on taglibs-dev was that such mapping is more
> >> appropriate in the core of the servlet spec. He recommended
> >> sending a proposal to jsr-154 (servlet 2.4 EG).
> >>
> >> I did.
> >>
> >> But both you and me can see that now, that's servlet 2.3 spec
> >> has been out for half a year already, great many people are
> >> still using
> >> servlet 2.2 and jsp 1.2 soft.
> >>
> >> So, 2.4 is far away. Even when it comes it will be a long time
> >> till it is wideley adopted. Even then some people will still use
> >> servlet 2.2 and 2.3 soft.
> >>
> >> So my opinion is that
> >>  1) yes, Locale->charset mapping is most appropriate
> >>     in the core of servlet 2.4 spec
> >>  2) it is a good idea to implement a temporary
> >>     substitute for it
> >>
> >> But then, standard taglib is an implementation of JSTL
> >> forthcoming spec. We can not expect support for
> >> such temporary work-around in the spec.
> >>
> >> Hence, two ways remain:
> >>  a) implement it in the going to be deprecated (your :-)
> i18n taglib
> >>  b) let everyone who needs it tailor the taglib on his own
> >>
> >> (See my P.S. section for a draft of this workaround I'm speaking
> >> about)
> >> >> -----Original Message-----
> >> >> From: tagunov [mailto:tagunov@motor.ru]
> >> >> Sent: Thursday, November 29, 2001 2:35 PM
> >> >> To: Tim Dawson
> >> >> Subject: _ja file is ISO2022JP, not SJIS coded
> >> >>
> >> >>
> >> >> Hello Tim!
> >> >>
> >> >> I have discovered that the sample bundle
> >> >>
> >> >>
> i18n\examples\src\org\apache\taglibs\i18n\i18n-test_ja.properties
> >> >>
> >> >> contains ISO2022JP coded text, not SJIS coded, that is why
> >> >> the <native2ascii encoding="SJIS".. in the build.xml
> >> >> japanese.encoding task does not work propelly on this file.
> >>
> >> Best regards,
> >>   Anton Tagunov     mailto:tagunov@motor.ru
> >>
> >>
> >> P.S. What this temporary workaround I'm speaking about could be:
> >> --------------------------------------------------------------
> >> ---------
> >> Jan Luehe's letter contained the following excerpt:
> >>   Kazuhiro Kazama wrote:
> >>
> >>   > ii) Some browsers uses an low-quality unicode font to
> >> display UTF-8
> >>   > encoded characters.
> >>   >
> >>   > And thus I would like to propose JSTL support multiple
> >> locale/multiple
> >>   > charset model and provide a database function to get a
> charset by
> >>   > specified locale. For example, Tomcat 4 provides
> >>   > org.apache.catalina.util.CharsetMapper internally for
> >> this purpose.
> >>   >
> >>   > But note that a locale may convined to multiple charsets.
> >> For example,
> >>   > "ja" locale is convined to one of "Shift_JIS", "ISO-2022-JP",
> >>   > "EUC-JP", "Windows-31J" etc. Because Shift_JIS has a difference
> >>   > mapping from Windows-31J, we must select one according to a Web
> >>   > application.
> >>   >
> >>   > Therefore it is a best solution to provide a database
> function to
> >>   > search a default charset-locale mapping and its override
> >> mechanism by
> >>   > a Web application.
> >>   >
> >>   > For example, in web.xml:
> >>   >     <charset-mapping>
> >>   >         <charset>ISO-8859-1</charset>
> >>   >         <locale>en</locale-type>
> >>   >     </charset-mapping>
> >>   >     <charset-mapping>
> >>   >         <charset>Shift_JIS</charset>
> >>   >         <locale>ja</locale-type>
> >>   >     </charset-mapping>
> >>   >
> >>   > This proposal may need more discussions in JSR-52, JSR-53
> >> and JSR-154
> >>   > experts and Apache committers.
> >>
> >> The idea is not that bad, and I beleive it could be implemented
> >> somewhere in the i18n taglib.
> >>
> >> My other solution was that the name of charset could be
> put into the
> >> bundle but that is not exactly the same, as in some cases
> the Locale
> >> is determined by the tags that format dates and numbers in
> the absense
> >> of a bundle (at least this is the case with standard/fmt).
> >>
>
>
>
> --
> Best regards,
>  Anton Tagunov                            mailto:tagunov@motor.ru
>
> P.S.
>
> BTW, i've just got one more idea, how to extend this and make the
> charset selection even more dynamic
>
> we define some interface, something like
>
> interface CharsetMapper{
>   String getCharset(HttpRequest rec, java.util.Locale loc);
> }
>
> (the HttpRequest is passed to enable examing the request,
> session parameters and cookies)
>
> The taglib searches the environment for some specially-named parameter
> (search is done in all the scopes: request, session and application)
> if an object is found it is cast to CharsetMapper and used.
> request.getSession(false) is passed as the first parameter.
>
> To handle reading the mapping data from the web.xml we could go one of
> the two following ways:
>
> 1) write a special servlet. it will in it's init method read its own
>    parameters:
>
>     <servlet>
>         <servlet-name>...</servlet-name>
>         <servlet-class>...</servlet-class>
>         <init-param>
>
> <param-name>org.apache.taglibs.i18n.CharsetMap.ru</param-name>
>             <param-value>windows-1251</param-value>
>         </init-param>
>     </servlet>
>
>    (or similar context parameters)
>    create an object implementing CharsetMapper interface and bind
>    it to the application scope.
>
>    This servlet will do nothing else, will have default doGet and
>    doPost and won't be bound to any path in the servlet engine.
>    (Hope it won't prevent it from being initialized? Then we'll bind
>    it to some unused path :-)
>
> 2) if no object has been found matching the special name in any scope
>    then the taglib code would search for the already described context
>    parameters.
>
> A use case for such dynamic charset selection:
>
>   somewhere in the site there's an explicit charset switch:
>
>      enable highly-multilingual pages (use UTF-8)
>      optimize for speed               (use national encodings)
>
>   or even a more detailed switch
>
>   Choose charset for the xxx languuage: xxx
>                                         yyy
>                                         zzz
>
>   5 years ago many russian sites did this.
>
>   I beleive that was due to incompatibilities in the browsers
>   and their failures to support cyrillics propelly. These
>   difficulties have been overcome by now and such selectors
>   have almost disappeared. Still I can imagine them being
>   implemented for some emergency cases.
>
> Your opinions? Is this an overkill?
>
> P.P.S. Maybe if enough people think this to be usefull
>        enough we could propose this to jsr-154 too?


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


--
To unsubscribe, e-mail:   <mailto:taglibs-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:taglibs-dev-help@jakarta.apache.org>


Mime
View raw message