forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sjur Nørstebø Moshagen <sjur.mosha...@kolumbus.fi>
Subject Re: i18n suggestion
Date Thu, 18 Mar 2004 10:34:33 GMT
På 17. mar. 2004 kl. 19.24 skrev Nicola Ken Barozzi:

> Upayavira wrote:
> ...
> > The question is -
>> what do do if there is a translated page that isn't accessible via a 
>> direct route from the homepage. This page wouldn't be found if we 
>> just followed links from a language's index page.
>
> This is a more general point: should we generate only pages that are 
> accessible through crawling?
>
> Short info for others: Cocoon is basically a function that transform 
> som input into output through xml. It's a generally non-trivially 
> invertible function, as it's not always easy (or possible in some 
> cases) to define the result space from the source space. For example, 
> if a matcher matches *my.html, and generates using my.xml, we will 
> have infinite possible result docs, named ciaomy.html, himy.html, 
> themy.html, etc.
>
> A pragmatic solution is to statically generate, similarly to wget, 
> only the pages that are reachable from an initial set, usually the 
> homepage. In fact this makes sense, as site contents should always be 
> reachable from the homepage.

The question is: how do you want to tread language/locale variation on 
a file level? What you suggest would imply a scheme like this, links 
indicated with -> :

foo.xml (-> bar.xml, baz.xml)
bar.xml
baz.xml

foo_de.xml (-> bar_de.xml, baz_de.xml)
bar_de.xml
baz_de.xml

If you start crawcling at the root, first with foo.xml, then with 
foo_de.xml, your scheme will work.

The problem is, we should rely on content negotiation as the main tool 
for selecting the correct version among several files. Language menus 
are only a secondary tool in cases where content negotiation does not 
give the wanted result, and should normally not be necessary to use. 
This implies the following change compared to the example above:

foo.xml (-> bar.xml, baz.xml)
bar.xml
baz.xml

foo_de.xml (-> bar.xml, baz.xml)
bar_de.xml
baz_de.xml

That is, you always and only link to the base version of a file, 
relying on the content negotiation capabilities of your server to pick 
the correct version in each case. Which makes crawling by following the 
links on a page non-working - you would only get the localised version 
of the front page.

The ability to relly on content negotiation is also important in a 
scenario where you have several editors working on different languages. 
You can't expect all editors to do translations from the root down 
(even if you have asked them to), thus you may be ending up with a file 
bar_de.xml somewhere in the tree that is _not_ linked to, even if we 
follow your scheme. If we on the other hand would rely on content 
negotiation, and just refer to the file as bar.xml, it would be 
available immediately.

That is, the crawling has to start at the root without any locale 
request (you ask for foo.xml), follow only the non-localised links, and 
at each link/file/point, look for localised source files besides the 
default file. This looking can either be restricted by a crawling 
parameter ('look for the following locales'), or be open-ended ('when 
you find a file foo.xml, also process all foo_*.xml').

> Alternatives are to define an easy mapping from source files like 
> Anakia does and stick to that. In this way by reading a source dir we 
> already have the resulting files. But since we want a totally 
> interlinked site, do we really need this?

I don't know Anakia, but it sounds similar to what I have described. 
And yes, we need it, because interlinking should not be between 
_localised_ files, only between _files_. The localisation is so to 
speak sitting "parasitic" on top of the default locale.

> I would simply add, beside the "pdf" and "print" buttons a "lang" 
> section where we can select the language. In this way all pages would 
> be easily reachable.

Agree, but it isn't that simple. As stated above, a "lang" section 
should only be used to _override_ whatever the content negotiation 
process gave you, it should not be a link to f.ex. the "German" version 
of the site. This overriding can be done in a couple of ways:

1) on a page-by-page basis:

Page:          Lang section links to:
foo.html       foo_de.html
                foo_es.html
                ...

This is a very simple way of lang-linking, but is enough for occasional 
use.

2) on a site level: instead of linking to a specific localised version 
of a page, one could set a cookie, as described in the i18n javadocs of 
Cocoon, and in a servlet/Cocoon context, this cookie would override 
standard request-header content negotiation. AFAIK, this solution will 
not work for a static site, since Apache (httpd) does not make use of 
cookies in this way. Thus, your users should know how to change their 
browser language preferences to get the best experience.

3) Also using Cocoon/Forrest in a servlet context, one can use CGI 
parameters instead of cookies.

4) I am sure there are other ways of doing this.

A different issue is how to build the lang section. Independent of 
which overriding method one uses, the lang section should reflect the 
available versions of the current page. One way of doing it would be to 
generate a separate langmenu file, parallel to what is done know for 
the ordinary menus. Thus, with the following page and versions:

foo.xml
foo_de.xml
foo_en.xml
foo_no.xml

one would get an intermediate file

langmenu-foo.xml

listing de,en,no as available languages. With some further processing, 
it can easily be turned into a section as you describe it.

> The question is how to do this: combobox, list with css, list with 
> javascript?

I would prefer list with css, but this should be configurable, and the 
needs will depend on the number of locales you want to support, as well 
as other issues.

Sjur


Mime
View raw message