cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <>
Subject Re: [RT] i18n in Cocoon and language independent semantic contexts
Date Sun, 11 Jun 2000 23:10:01 GMT
Berin Loritsch wrote:

> As long as we can include different files that will specify different
> languages.
> Most i18n systems simply need a way of getting to equivelant resources.
> While XML is powerful, it might be overkill here.  The main question here
> is how do we identify the resources.  GNU gettext generates certain tag
> files
> that have constants associated with a resource.  Whoever wants to translate
> the program that uses gettext simply takes this file and translates the
> phrases
> from one language to another.  A properties file will work nicely for this
> type of application.  Cocoon just needs to know which properties file to
> get.
> If we had a directory called "./i18n/" we can place the files within that
> directory in the format of the 2 letter country code followed by
> ".properties",
> and anyone who wants to provide a translation takes that file, changes the
> name to the proper country code and translates the messages in there.
> Simply stated:
> ./i18n/
> will be translated to become
> ./i18n/
> and so on.
> The entries will look like:
> SERVER500="Server error."
> translated to:
> SERVER500="Error de Server."

This is exactly what Java ResourceBundles do, bit by bit. We just have
to use the facility.

> > 2) uri space: good URIs don't change and are human readable. The sitemap
> > allows you to enforce the first (if you don't use extentions to indicate
> > your resources), and your URI-space design should enforce the second
> > one.
> >
> > Be careful, something like "/news/today" is a perfectly designed URI for
> > a website and can stand ages without requiring to change. But it's  not
> > human readable by non-english speakers. So it would be the italian
> > equivalent "/notizie/oggi".
> We could accomplish this with simple aliases.  We could also extend the
> previously stated (see #1) proposal to include a WEB-INF/i18n/
> suite of files to internationalize the URLs.  That way, we can provide a
> mechanism for site internationalization--not necessary for everyone, but
> a boon for whoever is willing to use it.  Such a directory should only be
> needed if some parameter is used in the sitemap.
> That way, we can identify a new namespace so that we can access the
> site internationalization:
> <sitemap xmlns:i18n="">
>   <i18n:resource dir="WEB-INF/i18n/" lang="en"/>
>   <process i18n:uri="resourceName"/>
> </sitemap>
> And i18n:uri, etc. would translate into whatever is the necessary attribute.
> In this case, it would be uri="user/add" or something equivalent.  That
> way, if I don't want to go through the trouble of internationalizing my
> site I can still use the old sitemap schema.  If I find I want to do that,
> I have that ability using the namespace.

Good suggestion. I like this. Powerful yet hidden if not required.
> > And, most important, is something like this worth the effort? (I've
> > never seen translated URI spaces, is there a web site that does this?)
> It may be the "wave of the future", it may be extra work, but the value
> of creating one XML document, and having the ability to perform
> translations easily is invaluable.  For Example:
> I specify a DTD that allows me to create a form like this:
> <form xmlns:i18n="">
>   <i18n:resource dir="WEB-INF/i18n/" lang="en"/>
>   <field name="user" type="drop-list">
>     <description>
>       <i18n:string resource="currentUser"/>
>     </description>
>     <selection>Stefano Mazzochi</selection>

It's "Mazzocchi" damn it! two "z"'s and two "c"'s :)

>     <selection>Berin Loritsch</selection>
>   </field>
> </form>
> Using a simple mechanism like this is very powerful.  The ability
> to make this easily available to the site designer in multiple areas
> will make this an incredibly killer app--especially when we place
> the language detection in XSP, XSLT, or by the engine.  Basically,
> we would have one XML "form" representing the same information,
> displayed in the users native language.
> C'est Manufique, non?

Oui, mais je dois te rappeler.... oh, sorry, wrong language :)

Yes, but we must not forget the schema you outlined above is
cocoon-aware... maybe an i18n filter could allow such processing to be
available without imposing cocoon-awareness of the schema.

Expecially, the above doesn't work for schemas you can't control, for
example XForm, where it would be more portable to defined namespaced
"attributes" instead of elements, following the XLink pattern.
> This approach works well as long as the resources are small.  If we
> have a press release or some other larger piece of information that
> is not a specific resource (the contents of the press release will be
> different for each release), then that would be best served by different
> XML files--one for each target language.
> Forms and functional spaces on a web site would benefit from such
> a system.  Generic information, how-tos, etc. will not.

I agree, these are different things and should be handled differently.
> > 3) schemas: this is something I've been concerned about for quite some
> > time and maybe some of you who were into the SGML world before can give
> > us advices. Schema has one embedded natural language.
> >
> >  <page xml:lang="it">
> >   <title>Hello World!</title>
> >   <paragraph>
> >    <bold>Hello World!</bold>
> >   </paragraph>
> >  </page>
> >
> > can be translated into
> >
> >  <page xml:lang="it">
> >   <title>Ciao a tutti!</title>
> >   <paragraph>
> >    <bold>Ciao a tutti!</bold>
> >   </paragraph>
> >  </page>
> >
> > but this _requires_ authors to understand english to understand the
> > markup. The real translation is
> >
> >  <pagina xml:lang="it">
> >   <titolo>Ciao a tutti!</titolo>
> >   <paragrafo>
> >    <grassetto>Ciao a tutti!</grassetto>
> >   </paragrafo>
> >  </pagina>

> All markup should be done be the site designer.  If my native language
> is English (which it is), then I would use an English markup to my site.
> If it were Spanish (I'm only 30% mobile in that language), then I would
> use Spanish markup.  The end user should never see the actual markup.
> The goal of XML/XSL is to transform the information into a useable
> format for the client.  If this format is a graphical view of the
> information
> (which XSL:FO is designed to give), then the end user sees the information
> represented graphically.  If the format is a machine readable and
> processable format (i.e. Business to Business data exchange formats),
> then translating the tags is not only overkill, it will completely break
> the system.

You didn't get my point. I was _not_ concerned about the user, but about
the different concern areas on the cocoon-powered site.

For example, let's look at something like style
designers are swedish, administrators are german and you have
journalists in all the european countries. Most of the journalists don't
know german nor swedish and little english. They happen to be very good
writer in their native languages and know a lot about volleyball.

Today, you have to do XSLT tranformations to go from



True, this is a very simple XSLT template

 <xsl:template select="pagina">

or you could write

  <rule lang="it" from="pagina" to="page">

then translate this (which is more manageable) into XSLT, then apply the

But this is so mechanic it should be applied at the specification level.

> This type of thing will also violate the spirit of what the purpose of XML
> is to provide: standard useable information.  

I disagree. I proposed to unlock the semantic information with the
natural language used to translate that into schemas. Two schemas may
have totally different element sets but express the _exact_ same
semantic structure.

For example, DocBook in English and DocBook (DocLibro?) in Italian. Mind
you: didn't say "for English or for Italian" but "_in_ English and _in_

These, for every XML meaning are different schemas, even if each element
is the translation of the other element.

> To use Microsoft's case for
> XML, we have a robot that goes to a site to get whether information.
> With HTML we observe that the information is in the 2nd table, 3rd cell.
> If the site designer has too much cafiene one night, our precious info
> is now in the 1st div on the page.  If the site had an XML representation,
> we know that we are looking for the info in the <weather/> tag.  If we
> start internationalizing the tags, then the information may be in the
> <weather/> tag for some people, but in a different tag for another person.

Thanks, I think I know this :)

My point is: what if the weather information you get is marked-up with
<tempo> instead of <weather>? How do you know it's stills something
about weather?

I hear people saying RDF. Sure, that's it, RDF and RDFSchema. <tempo>
might contained into an RDF sentence and the RDFSchema says that it
extends <weather>, so both share the same semantic meaning.

But language identities are such a special case it should be made much
simpler than this. RDF is and will remain a pain in the ass.

> That would create more chaos than it would solve.  I would venture to
> say that if your father is anything like mine, that he would care less what
> the markup looks like.

This is probably true :)
> As far as the sitemap is concerned, I still think i18n on that is too much.
> The sitemap is necessary for Cocoon to read.  If it used tags like <s/>
> and <p/> for <sitemap/> and <process/>, Cocoon wouldn't care as long
> as it can read it.  The longer names are necessary as long as we don't
> have a GUI to control the setup of the sitemap.

GUI propose a good filtering model for people that want this to be
i18n-ed... probably you're right indicating this flexibility is too much
and asking for trouble.
> > This allows another level of separation of concern where who creates the
> > XSLT is a english designer and who writes the XML document is an italian
> > journalist. (yes, the web site triggered many of these
> > thoughts)
> What happens when the situations are reversed?  I still say that the i18n
> on the actual markup introduces too much complexity, too much ability
> for human error, and too much difficulty in tracking down where the
> error lies.  Not to mention slows down performance to a crawl.
> Simple "resource" based i18n works wonderfully for most situations,
> and takes very little time to process--and could potentially be easy
> to implement.  Anything above this level of i18n becomes very complex
> and almost impossible to follow.
> There is such a thing as taking a good idea too far.

Sure, I'm fully aware of this danger. This is why these are RT not
"wisdom fragment" :)
> >                          ------------------ o ------------------
> >
> > Ok, but what can we do inside Cocoon without having to proprietarely
> > extend the XML specifications?
> Simple resource files.
> > Also, how can we simplify the sitemap evolution without compromising the
> > rest of the system?
> See #2 above.
> > I think a possible solution is sitemap pluggability and compilation.
> >
> > You could think at the sitemap like a big XSP taglib that is responsible
> > to drive directly the execution of the resource creation pipelines.
> Talk about learning curve.

No, nothing changes from the outside. The only thing is that we don't
write the sitemap interpreter, we write the sitemap compiler and keep
the sitemap pluggable as we do for XSP and generators.
> > It would also increase performance, since matching could be optimized
> > and what not.
> It would?  How?

During compilation you have the whole sitemap at hand. You could
optimize paths, refactor pipelines, optimize conditionals, evaluate
sitemap mistakes and create java code that simply executes the
request/response for you, using the instructions in the sitemap as well
as in the used components.

At least during development this could be an invaluable feature to drive
> > It would also allow different sitemap schemas to be developped. In
> > theory, you could create your own sitemap schema.
> Danger, Will Robinson, Danger!

I know, I know. :)
> > Well, this collection of RT is admittedly wild.
> Agreed :P
> > Digest with caution but think about it extensively since I know many FS
> > hides between the lines.
> I'll keep an open mind.
> I have to remember, that sometimes small and lean doesn't always mean
> elegant and optimized.
> To pull an example from the analog audio world about the design techniques
> used by people of different nationalities:  The American circuit designers
> believe that the shortest simplest path for the audio to travel is the best
> because every component introduced increases distortion.  British circuit
> designers, however, use as many components it takes to counter-act the
> distortion introduced by other components.  The end result is that British
> electronics sound warmer and more elegant while American electronics
> sound crisper and more sterile.  It is the difference between attempting
> for minimal distortion, and attempting to have the distortion pleasing to
> the ear.  This analogy applies to Pro electronics, I have no experience
> with British consumer gear.
> The way it applies here is that with my American mentalities, I am looking
> for the simplest, cleanest method to accomplish the same goal.  Stephano

STEFANO, damn it!!! "f" not "ph". Second time in the same message :)

> with a different mindset is proposing something that to the user can be
> more elegant and friendly.

I don't know. I need more feedback to find out... this is why I express
my thoughts as soon as they pop up.

sometimes they are plain silly, but some other times proved to be

Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<>                             Friedrich Nietzsche
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------

View raw message