forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sjur Nørstebø Moshagen <sjur.mosha...@kolumbus.fi>
Subject Re: i18n suggestion
Date Tue, 16 Mar 2004 12:11:06 GMT
På 16. mar. 2004 kl. 11.58 skrev Upayavira:

>>> One remaining point: how do we handle crawling? Do we crawl a page, 
>>> then seek all translations of it, or do we crawl from each 
>>> language's homepage, following links that way? Make sense?
>>
>> ...
>>
>>> a.html <en
>>> a.html <de
>>> b.html <en
>>> b.html <de
>>> c.html <en
>>> c.html <de
>>> d.html <en
>>> d.html <de
>>
>>
>> Maybe this? That is, crawl a page, then all translations of it?
>
> If we do this, we slow things down (in that we will get a lot of 
> broken pages for language versions that don't exist, and we will break 
> the use of broken-link handling to spot errors in the site. Or, if we 
> have default language technology in place, the site would return the 
> default language for each and every non-existent source file. So we 
> could have foo.en.html, foo.de.html, foo.es.html, all containing the 
> English version of the site. Actually, what we want is for the dynamic 
> version of the site to serve the default language, and the static 
> version to throw an error - which causes no page to be written.

Just to make sure we understand each other: "dynamic" as in "servlet"?

I am assuming we have the default language technology in place, and the 
content negotiation functionality for files we have been discussing.

We agree on the dynamic/servlet version.

Regarding the static version, the picture is somewhat more complicated. 
The main idea is to keep the servlet and the static versions identical. 
The problem with Forrest (and most Cocoon-based sites, I assume), is 
that one single page foo.html is made up of several different sources: 
foo.xml, menu.xml, tabs.xml, etc. For a given locale de_AT, 
foo_de_AT.xml might not exist, but you might have menu_de_AT.xml, and 
maybe tabs_de_AT.xml. What do you do?

My understanding of what we have been discussing, is that the servlet 
version would create a page foo.html with default content, but with 
menus and tabs in Austrian German. Is the resulting page "localised"? 
Technically, yes, even though it is only a small part of the page's 
content that has been localised. Which means that you _should_ create a 
page foo.html.de.at, even though the main content is in the same 
language as the default foo.html. Only if _none_ of the sources used to 
build a page is available in the specified locale, you should return an 
error or the default page.

> On the other hand, if we use my other method - crawling from each 
> language's homepage one at a time, if there are any language pages 
> that can't be reached directly from that language's homepage, then 
> they won't be generated. But maybe, if a page can't be reached from 
> its language's homepage, there is some kind of error in the site? 
> WDYT?

I am not sure I understand the differences between the CLI and the 
crawling process, or the relationship between them. Earlier you have 
said that you want the CLI to request pages in exactly the same way a 
browser would do. If so, you don't crawl from a language's home page to 
referenced pages of the same language - you crawl from the home page 
with a requested locale to other pages with the same requested locale. 
What you get in return would thus depend on how the locale is handled, 
won't it?

Sjur


Mime
View raw message