Return-Path: Delivered-To: apmail-xml-forrest-dev-archive@www.apache.org Received: (qmail 7692 invoked from network); 16 Mar 2004 12:11:03 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 16 Mar 2004 12:11:03 -0000 Received: (qmail 70110 invoked by uid 500); 16 Mar 2004 12:11:01 -0000 Delivered-To: apmail-xml-forrest-dev-archive@xml.apache.org Received: (qmail 70043 invoked by uid 500); 16 Mar 2004 12:11:00 -0000 Mailing-List: contact forrest-dev-help@xml.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: list-post: Reply-To: forrest-dev@xml.apache.org Delivered-To: mailing list forrest-dev@xml.apache.org Received: (qmail 70029 invoked from network); 16 Mar 2004 12:11:00 -0000 Received: from unknown (HELO fep21-app.kolumbus.fi) (193.229.0.48) by daedalus.apache.org with SMTP; 16 Mar 2004 12:11:00 -0000 Received: from [80.186.199.49] by fep21-app.kolumbus.fi with ESMTP id <20040316121059.GIIA27281.fep21-app.kolumbus.fi@[80.186.199.49]> for ; Tue, 16 Mar 2004 14:10:59 +0200 Mime-Version: 1.0 (Apple Message framework v612) In-Reply-To: <4056CFCD.1000707@upaya.co.uk> References: <60FEA972-72A0-11D8-BE54-000393BD5B24@kolumbus.fi> <404FADF6.6030301@che-che.com> <5FD58A52-732D-11D8-BE54-000393BD5B24@kolumbus.fi> <40503851.8060702@upaya.co.uk> <392E5FF6-7366-11D8-BE54-000393BD5B24@kolumbus.fi> <40519191.2020605@che-che.com> <6795A7D4-7415-11D8-BE54-000393BD5B24@kolumbus.fi> <4051A051.5090608@che-che.com> <8E832BDC-742D-11D8-BE54-000393BD5B24@kolumbus.fi> <40557520.20300@upaya.co.uk> <4055BE14.7030701@upaya.co.uk> <4056CFCD.1000707@upaya.co.uk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Message-Id: <00B13BDC-7743-11D8-BE54-000393BD5B24@kolumbus.fi> Content-Transfer-Encoding: quoted-printable From: =?ISO-8859-1?Q?Sjur_N=F8rsteb=F8_Moshagen?= Subject: Re: i18n suggestion Date: Tue, 16 Mar 2004 14:11:06 +0200 To: forrest-dev@xml.apache.org X-Mailer: Apple Mail (2.612) X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N P=E5 16. mar. 2004 kl. 11.58 skrev Upayavira: >>> One remaining point: how do we handle crawling? Do we crawl a page,=20= >>> then seek all translations of it, or do we crawl from each=20 >>> language's homepage, following links that way? Make sense? >> >> ... >> >>> a.html >> a.html >> b.html >> b.html >> c.html >> c.html >> d.html >> d.html > >> >> Maybe this? That is, crawl a page, then all translations of it? > > If we do this, we slow things down (in that we will get a lot of=20 > broken pages for language versions that don't exist, and we will break=20= > the use of broken-link handling to spot errors in the site. Or, if we=20= > have default language technology in place, the site would return the=20= > default language for each and every non-existent source file. So we=20 > could have foo.en.html, foo.de.html, foo.es.html, all containing the=20= > English version of the site. Actually, what we want is for the dynamic=20= > version of the site to serve the default language, and the static=20 > version to throw an error - which causes no page to be written. Just to make sure we understand each other: "dynamic" as in "servlet"? I am assuming we have the default language technology in place, and the=20= content negotiation functionality for files we have been discussing. We agree on the dynamic/servlet version. Regarding the static version, the picture is somewhat more complicated.=20= The main idea is to keep the servlet and the static versions identical.=20= The problem with Forrest (and most Cocoon-based sites, I assume), is=20 that one single page foo.html is made up of several different sources:=20= foo.xml, menu.xml, tabs.xml, etc. For a given locale de_AT,=20 foo_de_AT.xml might not exist, but you might have menu_de_AT.xml, and=20 maybe tabs_de_AT.xml. What do you do? My understanding of what we have been discussing, is that the servlet=20 version would create a page foo.html with default content, but with=20 menus and tabs in Austrian German. Is the resulting page "localised"?=20 Technically, yes, even though it is only a small part of the page's=20 content that has been localised. Which means that you _should_ create a=20= page foo.html.de.at, even though the main content is in the same=20 language as the default foo.html. Only if _none_ of the sources used to=20= build a page is available in the specified locale, you should return an=20= error or the default page. > On the other hand, if we use my other method - crawling from each=20 > language's homepage one at a time, if there are any language pages=20 > that can't be reached directly from that language's homepage, then=20 > they won't be generated. But maybe, if a page can't be reached from=20 > its language's homepage, there is some kind of error in the site?=20 > WDYT? I am not sure I understand the differences between the CLI and the=20 crawling process, or the relationship between them. Earlier you have=20 said that you want the CLI to request pages in exactly the same way a=20 browser would do. If so, you don't crawl from a language's home page to=20= referenced pages of the same language - you crawl from the home page=20 with a requested locale to other pages with the same requested locale.=20= What you get in return would thus depend on how the locale is handled,=20= won't it? Sjur