cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Neeme Praks" <ne...@one.lv>
Subject RE: [RT] i18n in Cocoon and language independent semantic contexts
Date Sun, 11 Jun 2000 14:35:55 GMT

> -----Original Message-----
> From: Stefano Mazzocchi [mailto:stefano@apache.org]
> Sent: Sunday, June 11, 2000 3:20 PM
> 
> The problems i18n poses are big and it's the reason why both Java and
> XML have Unicode support right from their core (a big advantage over
> almost all other programming languages).
> 
> Cocoon = Java + XML, so this means we need to place i18n support right
> into our core, or we'll be doomed by design limitations for 
> the rest of
> its lifetime (and force us to do a cocoon3 to fix design problems)
> 
> Let's see those problems:
> 
> 1) internal messages: errors, logs, comments all should be 
> driven by the
> JVM locale. Normally this is performed with Java ResourceBoundles.
> 
> Is this enough? Should we create an XML version of those resource
> boundles? is this a following the golden-hammer antipattern of "do it
> all with XML"?

good point. I have also been thinking about this and came to the
conclusion that it is better to use XML so you could possibly mark-up
your error messages (extreme examples: put in SVG graphics or some
equations in MathML ;-).
As long as there is no need to mark up error messages, ResourceBundles
should work just fine. And XML (as it's name shows ;-) is more
extendable than ResourceBundles, thus, more future-proof.

Actually, I'm not very familiar with ResourceBundles, but from their
name I conclude that they are textfiles consisting of key-value pairs...
Correct?

> 2) uri space: good URIs don't change and are human readable. 
> The sitemap
> allows you to enforce the first (if you don't use extentions 
> to indicate
> your resources), and your URI-space design should enforce the second
> one.
> 
> Be careful, something like "/news/today" is a perfectly 
> designed URI for
> a website and can stand ages without requiring to change. But 
> it's  not
> human readable by non-english speakers. So it would be the italian
> equivalent "/notizie/oggi".
> 
> This leads to something that was already expressed on the 
> list: can the
> sitemap allow to enforce different views of the same URI 
> space based on
> i18n issues? What's the best manageable way to do this? Where does
> separation of concerns accounts here? What's the best way to 
> scale such a thing?
>
> And, most important, is something like this worth the effort? (I've
> never seen translated URI spaces, is there a web site that does this?)

I have'nt seen any site with multiple language URLs either (alhough I
have been thinking about implementing the first one :-)... but I think
that this is the problem of current web development technologies: it is
very visible from the URL, what technologies are you using, what are you
coding/filename-naming conventions, etc.

And that I exactly the thing we are trying to change with Cocoon2,
right?

> 3) schemas: this is something I've been concerned about for quite some
> time and maybe some of you who were into the SGML world 
> before can give
> us advices. Schema has one embedded natural language.
> 
>  <page xml:lang="it">
>   <title>Hello World!</title>
>   <paragraph>
>    <bold>Hello World!</bold>
>   </paragraph>
>  </page>
> 
> can be translated into
> 
>  <page xml:lang="it">
>   <title>Ciao a tutti!</title>
>   <paragraph>
>    <bold>Ciao a tutti!</bold>
>   </paragraph>
>  </page>
> 
> but this _requires_ authors to understand english to understand the
> markup. The real translation is
> 
>  <pagina xml:lang="it">
>   <titolo>Ciao a tutti!</titolo>
>   <paragrafo>
>    <grassetto>Ciao a tutti!</grassetto>
>   </paragrafo>
>  </pagina>
> 
> which could easily pass my "father's test" (he doesn't speak english),
> while the previous one would not.
> 
> Are those pages different? No, they are different views of the same
> information.

[snip]

> So, let us suppose there exists one schema and the reference schema is
> written in english.
> 
> It should be possible to introduce a view of this schema by allowing
> semantic inheritance of the elements.
> 
> Let's make an example:
> 
>  <page:page xml:lang="en" xmlns:page="urn:page" 
> xmlns:style="urn:style">
>   <page:title>Hello World!</page:title>
>   <page:paragraph>
>    <style:bold>Hello World!</style:bold>
>   </page:paragraph>
>  </page:page>
> 
> and we want to translate this into HTML so we need page->html and
> markup->html (supposing page doesn't contain the equivalent of "style"
> semantic information)
> 
> No we want this to be readable for italians that don't know 
> english, but
> want to keep the same stylesheets. How could we achieve that?
> 
> I have a solution that requires (unfortunately) patching both the
> namespace and XMLSchema specifications:
> 
>  <pagina:pagina xml:lang="it" 
>     xmlns:pagina="urn:page" xmlns:pagina:lang="it" 
>     xmlns:stile="urn:style" xmlns:stile:lang="it">
>   <pagina:titolo>Ciao a tutti!</pagina:titolo>
>   <pagina:paragrafo>
>    <stile:grassetto>Ciao a tutti!</stile:grassetto>
>   </pagina:paragrafo>
>  </pagina:pagina>
> 
> where the XMLSchema should indicate that
> 
>  <pagina> -(equals)-> <page>
>  <titolo> -(equals)-> <title>
>  <paragrafo> -(equals)-> <paragraph>
> 
> and all create different natural languages views of the same namespace
> (urn:page) while
> 
>  <grassetto> -(equals)-> <bold>
> 
> for the namespace (urn:style).
> 
> Then, it can be possible for XML parsers to map all those elements in
> "language-neutral semantic equivalent classes" where XPaths can access
> them indipendently of their natural language form.
> 
> For example, the XPath "/page/title" should return "Ciao a Tutti!" if
> applied to the italian version of the page and "Hello World!" 
> if applied
> to the english version (version indicated with xml:lang), but 
> should be
> transparent on the language used to present the schema elements.

makes sense. However, it should also be possible to get the same result
with XPath "/pagina/titolo"?

Also, this raises another topic I have been thinking about: how would
you store the different versions of the same document? Embed everything
in one tree or separate into different trees?

Something like this:
<doc>
  <pagina:pagina xml:lang="it" 
     xmlns:pagina="urn:page" xmlns:pagina:lang="it" 
     xmlns:stile="urn:style" xmlns:stile:lang="it">
   <pagina:titolo>Ciao a tutti!</pagina:titolo>
   <pagina:paragrafo>
    <stile:grassetto>Ciao a tutti!</stile:grassetto>
   </pagina:paragrafo>
  </pagina:pagina>
  <page:page xml:lang="en" 
     xmlns:page="urn:page" xmlns:page:lang="en" 
     xmlns:style="urn:style" xmlns:style:lang="en">
   <page:title>Hello World!</page:title>
   <page:paragraph>
    <style:bold>Hello World!</style:bold>
   </page:paragraph>
  </page:page>
</doc>

or something like this:

<doc>
  <page:page>
   <page:title>
      <text>
        <en>Hello World!</en>
        <it>Ciao a tutti!</it>
      </text>
   </page:title>
   <page:paragraph>
    <style:bold>
      <text>
        <en>Hello World!</en>
        <it>Ciao a tutti!</it>
      </text>
    </style:bold>
   </page:paragraph>
  </page:page>
</doc>

This is more a concern on the storage level. When authoring, it is
natural to use the most convenient tags...
Why would I like to use the latter approach? Well, it enforces the
documents in different languages to have the same structure. But then
again, this might not be so good idea... as different languages could
have different sentence structure, the exact structure of the document
might deviate also. Also, XML is perfect for unstructured documents, so
why make things so structured...
A better solution to enforcing the document structure would be to
generate a kind of template (XML Schema?) from the original document and
when translating, you would get warnings instead of errors when
deviating from the original structure. Just RTs...

[snip]

>                          ------------------ o ------------------
> 
> Ok, but what can we do inside Cocoon without having to proprietarely
> extend the XML specifications?

how about proposing these extensions also to the XML Schema WG, for the
next version of XML Schema?

> Also, how can we simplify the sitemap evolution without 
> compromising the rest of the system?
> 
> I think a possible solution is sitemap pluggability and compilation.
> 
> You could think at the sitemap like a big XSP taglib that is 
> responsible
> to drive directly the execution of the resource creation pipelines.
> 
> It would also increase performance, since matching could be optimized
> and what not.
> 
> It would also allow different sitemap schemas to be developped. In
> theory, you could create your own sitemap schema.

well, as I understood you correctly, you basically propose to make
sitemap a XSP page?
Wasn't this already discussed?
Well, the power of this solution would be enourmous, I'm having
difficulties imagining the true amount of power this would actually give
us ;-)

> Well, this collection of RT is admittedly wild.

This is what I like the most about them ;-)

Neeme

Mime
View raw message