cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Berin Loritsch <blorit...@infoplanning.com>
Subject Re: [RT] i18n in Cocoon and language independent semantic contexts
Date Sun, 11 Jun 2000 18:34:36 GMT
Stefano Mazzocchi wrote:

> The problems i18n poses are big and it's the reason why both Java and
> XML have Unicode support right from their core (a big advantage over
> almost all other programming languages).

Agreed.

> Cocoon = Java + XML, so this means we need to place i18n support right
> into our core, or we'll be doomed by design limitations for the rest of
> its lifetime (and force us to do a cocoon3 to fix design problems)

True that.

>
> Let's see those problems:
>
> 1) internal messages: errors, logs, comments all should be driven by the
> JVM locale. Normally this is performed with Java ResourceBoundles.
>
> Is this enough? Should we create an XML version of those resource
> boundles? is this a following the golden-hammer antipattern of "do it
> all with XML"?

As long as we can include different files that will specify different
languages.
Most i18n systems simply need a way of getting to equivelant resources.
While XML is powerful, it might be overkill here.  The main question here
is how do we identify the resources.  GNU gettext generates certain tag
files
that have constants associated with a resource.  Whoever wants to translate
the program that uses gettext simply takes this file and translates the
phrases
from one language to another.  A properties file will work nicely for this
type of application.  Cocoon just needs to know which properties file to
get.
If we had a directory called "./i18n/" we can place the files within that
directory in the format of the 2 letter country code followed by
".properties",
and anyone who wants to provide a translation takes that file, changes the
name to the proper country code and translates the messages in there.

Simply stated:
./i18n/en.properties

will be translated to become
./i18n/es.properties
and so on.

The entries will look like:
SERVER500="Server error."
translated to:
SERVER500="Error de Server."

> 2) uri space: good URIs don't change and are human readable. The sitemap
> allows you to enforce the first (if you don't use extentions to indicate
> your resources), and your URI-space design should enforce the second
> one.
>
> Be careful, something like "/news/today" is a perfectly designed URI for
> a website and can stand ages without requiring to change. But it's  not
> human readable by non-english speakers. So it would be the italian
> equivalent "/notizie/oggi".

We could accomplish this with simple aliases.  We could also extend the
previously stated (see #1) proposal to include a WEB-INF/i18n/en.properties
suite of files to internationalize the URLs.  That way, we can provide a
mechanism for site internationalization--not necessary for everyone, but
a boon for whoever is willing to use it.  Such a directory should only be
needed if some parameter is used in the sitemap.

That way, we can identify a new namespace so that we can access the
site internationalization:
<sitemap xmlns:i18n="http://xml.apache.org/cocoon/i18n">
  <i18n:resource dir="WEB-INF/i18n/" lang="en"/>
  <process i18n:uri="resourceName"/>
</sitemap>

And i18n:uri, etc. would translate into whatever is the necessary attribute.

In this case, it would be uri="user/add" or something equivalent.  That
way, if I don't want to go through the trouble of internationalizing my
site I can still use the old sitemap schema.  If I find I want to do that,
I have that ability using the namespace.

> And, most important, is something like this worth the effort? (I've
> never seen translated URI spaces, is there a web site that does this?)

It may be the "wave of the future", it may be extra work, but the value
of creating one XML document, and having the ability to perform
translations easily is invaluable.  For Example:
I specify a DTD that allows me to create a form like this:
<form xmlns:i18n="http://xml.apache.org/cocoon/i18n">
  <i18n:resource dir="WEB-INF/i18n/" lang="en"/>
  <field name="user" type="drop-list">
    <description>
      <i18n:string resource="currentUser"/>
    </description>
    <selection>Stefano Mazzochi</selection>
    <selection>Berin Loritsch</selection>
  </field>
</form>

Using a simple mechanism like this is very powerful.  The ability
to make this easily available to the site designer in multiple areas
will make this an incredibly killer app--especially when we place
the language detection in XSP, XSLT, or by the engine.  Basically,
we would have one XML "form" representing the same information,
displayed in the users native language.

C'est Manufique, non?

This approach works well as long as the resources are small.  If we
have a press release or some other larger piece of information that
is not a specific resource (the contents of the press release will be
different for each release), then that would be best served by different
XML files--one for each target language.

Forms and functional spaces on a web site would benefit from such
a system.  Generic information, how-tos, etc. will not.

> 3) schemas: this is something I've been concerned about for quite some
> time and maybe some of you who were into the SGML world before can give
> us advices. Schema has one embedded natural language.
>
>  <page xml:lang="it">
>   <title>Hello World!</title>
>   <paragraph>
>    <bold>Hello World!</bold>
>   </paragraph>
>  </page>
>
> can be translated into
>
>  <page xml:lang="it">
>   <title>Ciao a tutti!</title>
>   <paragraph>
>    <bold>Ciao a tutti!</bold>
>   </paragraph>
>  </page>
>
> but this _requires_ authors to understand english to understand the
> markup. The real translation is
>
>  <pagina xml:lang="it">
>   <titolo>Ciao a tutti!</titolo>
>   <paragrafo>
>    <grassetto>Ciao a tutti!</grassetto>
>   </paragrafo>
>  </pagina>

AAAAAHHHHH! Noooo!

All markup should be done be the site designer.  If my native language
is English (which it is), then I would use an English markup to my site.
If it were Spanish (I'm only 30% mobile in that language), then I would
use Spanish markup.  The end user should never see the actual markup.
The goal of XML/XSL is to transform the information into a useable
format for the client.  If this format is a graphical view of the
information
(which XSL:FO is designed to give), then the end user sees the information
represented graphically.  If the format is a machine readable and
processable format (i.e. Business to Business data exchange formats),
then translating the tags is not only overkill, it will completely break
the system.

This type of thing will also violate the spirit of what the purpose of XML
is to provide: standard useable information.  To use Microsoft's case for
XML, we have a robot that goes to a site to get whether information.
With HTML we observe that the information is in the 2nd table, 3rd cell.
If the site designer has too much cafiene one night, our precious info
is now in the 1st div on the page.  If the site had an XML representation,
we know that we are looking for the info in the <weather/> tag.  If we
start internationalizing the tags, then the information may be in the
<weather/> tag for some people, but in a different tag for another person.

That would create more chaos than it would solve.  I would venture to
say that if your father is anything like mine, that he would care less what
the markup looks like.

As far as the sitemap is concerned, I still think i18n on that is too much.
The sitemap is necessary for Cocoon to read.  If it used tags like <s/>
and <p/> for <sitemap/> and <process/>, Cocoon wouldn't care as long
as it can read it.  The longer names are necessary as long as we don't
have a GUI to control the setup of the sitemap.

> This allows another level of separation of concern where who creates the
> XSLT is a english designer and who writes the XML document is an italian
> journalist. (yes, the eurofootball.com web site triggered many of these
> thoughts)

What happens when the situations are reversed?  I still say that the i18n
on the actual markup introduces too much complexity, too much ability
for human error, and too much difficulty in tracking down where the
error lies.  Not to mention slows down performance to a crawl.

Simple "resource" based i18n works wonderfully for most situations,
and takes very little time to process--and could potentially be easy
to implement.  Anything above this level of i18n becomes very complex
and almost impossible to follow.

There is such a thing as taking a good idea too far.

>                          ------------------ o ------------------
>
> Ok, but what can we do inside Cocoon without having to proprietarely
> extend the XML specifications?

Simple resource files.

> Also, how can we simplify the sitemap evolution without compromising the
> rest of the system?

See #2 above.

> I think a possible solution is sitemap pluggability and compilation.
>
> You could think at the sitemap like a big XSP taglib that is responsible
> to drive directly the execution of the resource creation pipelines.

Talk about learning curve.

> It would also increase performance, since matching could be optimized
> and what not.

It would?  How?

> It would also allow different sitemap schemas to be developped. In
> theory, you could create your own sitemap schema.

Danger, Will Robinson, Danger!

> Well, this collection of RT is admittedly wild.

Agreed :P

> Digest with caution but think about it extensively since I know many FS
> hides between the lines.

I'll keep an open mind.

I have to remember, that sometimes small and lean doesn't always mean
elegant and optimized.

To pull an example from the analog audio world about the design techniques
used by people of different nationalities:  The American circuit designers
believe that the shortest simplest path for the audio to travel is the best
because every component introduced increases distortion.  British circuit
designers, however, use as many components it takes to counter-act the
distortion introduced by other components.  The end result is that British
electronics sound warmer and more elegant while American electronics
sound crisper and more sterile.  It is the difference between attempting
for minimal distortion, and attempting to have the distortion pleasing to
the ear.  This analogy applies to Pro electronics, I have no experience
with British consumer gear.

The way it applies here is that with my American mentalities, I am looking
for the simplest, cleanest method to accomplish the same goal.  Stephano
with a different mindset is proposing something that to the user can be
more elegant and friendly.

_____________________________________________
NetZero - Defenders of the Free World
Click here for FREE Internet Access and Email
http://www.netzero.net/download/index.html

Mime
View raw message