forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Upayavira ...@upaya.co.uk>
Subject Re: i18n suggestion
Date Tue, 16 Mar 2004 12:22:04 GMT
Sjur Nørstebø Moshagen wrote:

> På 16. mar. 2004 kl. 11.58 skrev Upayavira:
>
>>>> One remaining point: how do we handle crawling? Do we crawl a page, 
>>>> then seek all translations of it, or do we crawl from each 
>>>> language's homepage, following links that way? Make sense?
>>>
>>> ...
>>>
>>>> a.html <en
>>>> a.html <de
>>>> b.html <en
>>>> b.html <de
>>>> c.html <en
>>>> c.html <de
>>>> d.html <en
>>>> d.html <de
>>>
>>>
>>>
>>> Maybe this? That is, crawl a page, then all translations of it?
>>
>>
>> If we do this, we slow things down (in that we will get a lot of 
>> broken pages for language versions that don't exist, and we will 
>> break the use of broken-link handling to spot errors in the site. Or, 
>> if we have default language technology in place, the site would 
>> return the default language for each and every non-existent source 
>> file. So we could have foo.en.html, foo.de.html, foo.es.html, all 
>> containing the English version of the site. Actually, what we want is 
>> for the dynamic version of the site to serve the default language, 
>> and the static version to throw an error - which causes no page to be 
>> written.
>
>
> Just to make sure we understand each other: "dynamic" as in "servlet"?

Yes. And static = files created offline by the CLI.

> I am assuming we have the default language technology in place, and 
> the content negotiation functionality for files we have been discussing.
>
> We agree on the dynamic/servlet version.
>
> Regarding the static version, the picture is somewhat more 
> complicated. The main idea is to keep the servlet and the static 
> versions identical. The problem with Forrest (and most Cocoon-based 
> sites, I assume), is that one single page foo.html is made up of 
> several different sources: foo.xml, menu.xml, tabs.xml, etc. For a 
> given locale de_AT, foo_de_AT.xml might not exist, but you might have 
> menu_de_AT.xml, and maybe tabs_de_AT.xml. What do you do?
>
> My understanding of what we have been discussing, is that the servlet 
> version would create a page foo.html with default content, but with 
> menus and tabs in Austrian German. Is the resulting page "localised"? 
> Technically, yes, even though it is only a small part of the page's 
> content that has been localised. Which means that you _should_ create 
> a page foo.html.de.at, even though the main content is in the same 
> language as the default foo.html. Only if _none_ of the sources used 
> to build a page is available in the specified locale, you should 
> return an error or the default page.

For static, no. Say we're creating a new translation, to polish. One of 
the first things we'd do is create a menu_pl.xml. Having done this, 
_all_ pages on the site would now be created in Polish, even though all 
there is is the menu. Not very helpful to the Polish speaker. Therefore, 
I would say it is only if the main content exists that the page should 
be created.

>> On the other hand, if we use my other method - crawling from each 
>> language's homepage one at a time, if there are any language pages 
>> that can't be reached directly from that language's homepage, then 
>> they won't be generated. But maybe, if a page can't be reached from 
>> its language's homepage, there is some kind of error in the site? WDYT?
>
> I am not sure I understand the differences between the CLI and the 
> crawling process, or the relationship between them. 

The CLI contains a crawler. It does the crawling.

> Earlier you have said that you want the CLI to request pages in 
> exactly the same way a browser would do. If so, you don't crawl from a 
> language's home page to referenced pages of the same language - you 
> crawl from the home page with a requested locale to other pages with 
> the same requested locale. What you get in return would thus depend on 
> how the locale is handled, won't it? 

Hmm. What I had in mind is that you would say to the CLI: locales="en, 
de-at, de-de, es, pl, pt". This would then request each relevant page 
six times, once per provided locale. Thus you would get pages created 
for each of your locales. Where a page could not be created, as no 
sources exist for that language, it would not create a page. So there's 
no fallback here. The fallback is handled by the web server: Apache.

Upayavira, who's on a steep learning curve regarding i18n at the moment!



Mime
View raw message