forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Upayavira ...@upaya.co.uk>
Subject Re: i18n suggestion
Date Fri, 19 Mar 2004 12:21:20 GMT
Nicola Ken Barozzi wrote:

> Upayavira wrote:
> ...
> > The question is -
>
>> what do do if there is a translated page that isn't accessible via a 
>> direct route from the homepage. This page wouldn't be found if we 
>> just followed links from a language's index page.
>
>
> This is a more general point: should we generate only pages that are 
> accessible through crawling?
>
> Short info for others: Cocoon is basically a function that transform 
> som input into output through xml. It's a generally non-trivially 
> invertible function, as it's not always easy (or possible in some 
> cases) to define the result space from the source space. For example, 
> if a matcher matches *my.html, and generates using my.xml, we will 
> have infinite possible result docs, named ciaomy.html, himy.html, 
> themy.html, etc.
>
> A pragmatic solution is to statically generate, similarly to wget, 
> only the pages that are reachable from an initial set, usually the 
> homepage. In fact this makes sense, as site contents should always be 
> reachable from the homepage.
>
> Alternatives are to define an easy mapping from source files like 
> Anakia does and stick to that. In this way by reading a source dir we 
> already have the resulting files. But since we want a totally 
> interlinked site, do we really need this?
>
> I would simply add, beside the "pdf" and "print" buttons a "lang" 
> section where we can select the language. In this way all pages would 
> be easily reachable.

Hi Nicola Ken! Long time no speak!

Thank you for this explanation. Thanks for elaborating on Cheche's idea. 
It makes more sense to me now. Especially if we use the hreflang 
attribute to identify the locale of the page we're crawling to. The 
question still remains: How do we identify the translations that exist 
for a translation? Just using the DirectoryGenerator and XSLT would be a 
performance nightmare. Do we need to extend the DirectoryGenerator into 
a TranslationGatheringGenerator, where you pass it the name of a page: 
src/xml/index.LANG.xml, and it returns src/xml/index.en.xml, 
src/xml/index.de.xml, etc, etc? From there you can build your 
translations section.

I still plan to add the ability to have the CLI crawl from the home page 
for a number of locales. It may well be useful to me, and would be 
useful where people don't want to specify the translations available on 
every page.

> The question is how to do this: combobox, list with css, list with 
> javascript?

Ah, now that is up to the Forrest guys... apart from the fact that a 
combo box would use links that the CLI couldn't follow.

As to the other question: syntax for getting filenames, Vadim suggested 
using the LocaleAction, but that doesn't help us to support fallback 
(e.g. from de-at to just de, where no page.de.at.xml exists). amd where 
we have mulitple file formats, and so on.

So, we need a way to ge hold of the source for the page, using the 
locale, or locales, and using fallback. To do this we need to specify a 
pattern that will make up the filename. But at present, we can't nest 
input modules. So I'm kinda stuck. We need to say something like:

 <map:match pattern="**/*.html">
    <map:generate src= "{1}/{localized-page:{2}.{locale}.xml}"/>
    <map:transform src="....."/>
    <map:serialize/>
 </map:match>

Where localised-page: causes the system to try all values of {locale} in 
the correct order until it finds a source that exists. But we can't next 
input modules. [I can see how to rewrite them so they could be nested, 
but it would involve writing a simple LALR parser (I think) and it's a 
long time since I've written one!] With nested input modules, this would 
be easy. Without, I cannot think how to do it.

Also, I can have the CLI put something into the object model that tells 
this new component not to do fallback - in fact it could throw a 
LocaleNotFoundException, which the CLI would catch to identify that that 
page _shouldn't_ have been generated (ii.e. it isn't an error). Withouth 
this thing in the object model, the component would try to do whatever 
fallback it can.

I've pretty much finished my conversion of the Cocoon wiki to Moin, so 
my next 'part time' project is to add translations to my Cocoon based 
(off-line generated) site, fwbo.org. I've got all the content already 
translated into Polish, so now I've just got to work out the 
infrastructure to serve it. So, if we can agree on/find a way to do 
this, I'll be on to making it happen (CLI amendments and new Cocoon I18N 
component(s)).

So, to sumarise my current undestanding of our discussions:

1) Pages will include list of translations available for that page
2) CLI will have to grab the language as well as the page link when 
gathering links
3) CLI will be able to crawl from home page in a number of different 
languages
4) Some way of gathering the list of valid translations for a page is 
required. See suggestion above.
5) Some way of getting the appropriate source file given a locale and a 
filename pattern is required

Is that everything?

Regards, Upayavira



Mime
View raw message