forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sjur Nørstebø Moshagen <>
Subject Re: i18n errors, bad downgrading
Date Thu, 11 Mar 2004 14:13:09 GMT
[WARNING: lengthy message]

På 11. mar. 2004 kl. 11.58 skrev Upayavira:

> Well, you can start by explaining to me how I18N works ;-)

I am no expert, so the following is only what I've understood from  
documentation on http, apache and cocoon. The answer is maybe too basic  
and detailed on one hand - skip whatever is known to you - and too  
simplistic and lacking details on the other.

First of all, there are two different perspectives on i18n when talking  
about web applications:

- the client view
- the server view

The two views are tied together with _content negotiation_ using  
locales, trying to find the best match between what the client wants  
and what is actually available.

The client
The client sends an ordered list of locales. This is like a whish list,  
with the highest priority on top: the first locale should be returned  
from the server if available, if not, try the next, and so on. In  
principle, language versions outside this list should never be served  
even if available, but in practice a default version is served (the  
notion of "default" seems to vary between different servers).

The main idea is that the client/user can use this list to specify  
which locales/languages are acceptible to her/him. This implies that  
_all_ and _only_ the locales in the list are acceptible.

The server

The server has a set of localized versions. This set will most likely  
not be the same as the locale list coming from the client. The server  
should go through each locale value from the client starting from the  
one with highest priority, and see if there is a resource available for  
that locale, until it finds a match (there are more details to this,  
see below). THIS IS WHERE FORREST FAILS TODAY. It fails in both these  
- forrest: fails on build, because not all locale variants are  
available for
     all files; it is unclear to me how the set of locales is defined,  
but it
     seems it just takes the xdocs/index.xml as a starting point (others  
     more about this)
- forrest run: starts just fine, but when going to a page without a  
     corresponding to the browser's highest priority, cocoon throws an  

Internal Server Error
Message: null
Description: No details available.
Sender: org.apache.cocoon.servlet.CocoonServlet
Source: Cocoon Servlet
Request URI
xdocs/gram/index_no.xml (No such file or directory)
Apache Cocoon 2.1.4

I suspect it fails in the following two cases as well:
- forrest war: not tested yet, but should be similar to forrest run
- forrest webapp: not tested yet, but should be similar to forrest run

A simple example for how it should work:

Client sends a request for index.html with the following locales: se,  
no, sv, en
Server has the following files (I'm using apache conventions):

The first (and only) match is on, and it should be served  
to the client as index.html

The locales

A locale specification is built from the following components:
- language: an ISO 639 code
- country: an ISO 3166 code
- variant: a language variant
- encoding: an encoding specification such as utf-8, 8859-1, etc.

There are several documents on the web describing the details. Here are  

There are also differences on the exact concatenation of the locale  
parts and the placement of the locale string in a resource name,  
depending on context (for example Apache and Cocoon uses different  
schemes). Normally though, the conversion from one format to another is  
handled automatically, you just have to know the correct form for the  
server you are authoring resources for.

Content negotiation

As said earlier, there are more details to content negotiation which  
complicates the picture a bit. Because locales can be more complex than  
simple language tags, you might have for example (taken from the Apache  
test page):


The basic idea is that the server should try to find the best match  
according to _two_ different axes: the users list of preferred locales  
(as described above, locale details below), and a degradation scheme as  


If the client requests index.html with a locale list as follows:  
ru-utf8, the server should return ("disguised" as  
index.html). No degradation required in this case. If the client  
requests pt-PT (Portuguese as in Portugal), the server should look for  
it, find no match, try just pt, find a match and serve it. This makes a  
successful degradation from a country-specific version to a general pt  

One more example:
Client asks for index.html, and sends these locales: pt_PT, se, en
Server has no match for pt_PT, degrades to pt, and serves that, even  
though it has the other locales requested as well.

This is the basic outline of how locale-sensitive content negotiation  
should work, and what I expected from Forrest.

What happens if the client has a locale setting such as ru_EE? That is,  
russian in Estonia, no encoding specified. Or just ru (russian), wihout  
any encoding information or any other information? Which version should  
be served to the client? I don't know, but Apache does, as does Cocoon.

This is _my_ main question: since Cocoon knows how to deal with content  
negotiation, why doesn't Forrest? (I am pretty ignorant of the  
relationship between forrest and cocoon.)

> I have a site that is currently in English. I've got Polish content to  
> add. It is built with Cocoon, but generated with the Cocoon Ant task  
> (same stuff as CLI). I plan to extend the CLI to be able to generate  
> multiple versions of the same page, by allowing something like:
>   <uris name="docs" follow-links="true" locales="en,pl">
>     <uri type="insert" src-prefix="docs/" src="index.html"
>          dest="build/dest/*{locale}" />
>   </uris>
> What this says is, generate the site twice, each time starting  
> crawling at index.html, first time with a locale set to en, second  
> time with it set to pl.
> When it has created a page, say buddhism.html, it would create the  
> filename for this page from the specified destination of  
> build/dest/*.{locale}. So, it would become  
> build/dest/buddhism.html.en, or build/dest/ I believe  
> this suits Apache's I18N functionality.

It should.

> Now, I don't know what really constitutes a 'locale', and what sorts  
> of strings I might expect a locale to be. Some samples would really  
> help me.

I hope what I wrote above has clearified a little.

One more example - the site I'm working on:

It should be a portal for electronic dictionaries and terminology  
serving several minority languages in Norway, Sweden, and Finland (in  
the future possibly also North-West Russia). The site has several  
sections, for which there are different language needs. The main  
sections are (languages/locales in parentheses):

- the public portal (se, smj, sma, sms, smn, no, sv, fi, en?)
- internal administrative stuff (se, no, others?)
- technical documentation (en, no?)

Up front it is clear that not all languages will be used on all pages  
on the site, and it is also clear that not all pages will be translated  
and released simultaneously.

What I planned to do was to set up a minimal version of the site, and  
then let other editors populate it with documents in different  
languages (I don't speak even half the languages). There will be one  
editor for each language, translating and publishing that language's  
version at his/her own pace.

I develop the site on my own computer, builds a war file, which is then  
deployed on the site host in Tomcat. The editors will need access to  
the deployed Tomcat site for installing new versions and documents  
(details here still to be worked out).

I want to do all in XML, both for increased reusability and greater  
flexibility. I have set up a simple near-WYSIWYG XML editing facility  
for the editors using XXE ( and the  
Forrest DTDs, and plan to make the publishing process very easy and  
lightweight for the editors.

For a site like this to work well for its users, it is dependent upon  
locale content negotiation to work, as well as the possibility for a  
user to set his or her favorite language for the site (as a cookie, or  
at least as the session language/locale). This way, it should be  
possible for users to get their language(s), in decreasing order of  
preference depending on availability. The experience to the users  
should be reasonably pleasant, as the site should always be able to  
serve a language they can understand, or easily shift to one if needed.

Forrest suits my (other) needs very well, except that I didn't check  
its i18n features well enough before starting to use it. Thus I won't  
give up on Forrest lightly, but rather try to help develope it into the  
tool I need.

> Forresters - I presume that if, when Cocoon is generating a page, the  
> environment's locale is set correctly to whatever is provided to the  
> CLI, in this case en or pl,. that this would work well with Forrest's  
> I18N functionality. Or am I missing something.

One question that comes to mind, is: what happens when a page is _not_  
available in the requested locale? Presently Cocoon/Forrest fails. If  
given only one locale, it can be argued that this is correct behaviour  
for a server, although a better aproach would be to have a default  
fallback language/file (which you would need to have anyway - if not,  
what would you serve a user without a locale specification at all, or  
with only locales that are foreign to your site?). The fallback file  
could either be the regular content in the language decided upon by the  
webmaster, or a simple page telling the user that the page/site is only  
available in some specified languages, perhaps with a link to a page  
explaining how to set the languages/locales for a browser.

> Regards, Upayavira


View raw message