Return-Path: X-Original-To: apmail-openoffice-dev-archive@www.apache.org Delivered-To: apmail-openoffice-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3CD4C10527 for ; Mon, 26 Aug 2013 14:20:46 +0000 (UTC) Received: (qmail 93335 invoked by uid 500); 26 Aug 2013 14:20:45 -0000 Delivered-To: apmail-openoffice-dev-archive@openoffice.apache.org Received: (qmail 93035 invoked by uid 500); 26 Aug 2013 14:20:44 -0000 Mailing-List: contact dev-help@openoffice.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@openoffice.apache.org Delivered-To: mailing list dev@openoffice.apache.org Received: (qmail 93021 invoked by uid 99); 26 Aug 2013 14:20:44 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Aug 2013 14:20:43 +0000 Received: from localhost (HELO mail-pb0-f49.google.com) (127.0.0.1) (smtp-auth username robweir, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Aug 2013 14:20:43 +0000 Received: by mail-pb0-f49.google.com with SMTP id xb4so3485984pbc.22 for ; Mon, 26 Aug 2013 07:20:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=kPgEsIhsavYgYDHWEQn8+jzRMx97zuNBfSZFehLIs8I=; b=AEGu/sUW5Fs3QEQdwsDaJO/hvysJiDPi3BJTE4AsKk19FnVljgYAEIyskKrO9/N7zc 5vh6jRDf2DtD/Zu4B89V5VbJkKeNM9QmGwRjmHu6ZJdpo90Rj9U0K7FV2RsyiKMHX9J8 pDiNgU6QWWE5DZjuOLvOyTabx77klRM7cz2CjON8y/cFlW/QpckWPZJaFDrZnrskHBmi EzZmrTcbly3rrKRADN3yRxbRlkp5zQmWNE5EN/4oPjN1tD6aADLzLtBRkDmNHsu9GbNV bdSZ5ydBrARlzE5lhrlSW3o1BWI6Cl+m+ag6Sh+EjfBk6tDCzZ5LmzRz2Rr0XpKnJ1Lz 6XtQ== MIME-Version: 1.0 X-Received: by 10.66.164.199 with SMTP id ys7mr15015545pab.104.1377526843160; Mon, 26 Aug 2013 07:20:43 -0700 (PDT) Received: by 10.70.41.66 with HTTP; Mon, 26 Aug 2013 07:20:43 -0700 (PDT) In-Reply-To: <308FCCD2-64EB-46C3-9B3F-B43B26570D1F@apache.org> References: <308FCCD2-64EB-46C3-9B3F-B43B26570D1F@apache.org> Date: Mon, 26 Aug 2013 10:20:43 -0400 Message-ID: Subject: Re: Brainstorming: Can we refactor the website to make translation easier? From: Rob Weir To: "dev@openoffice.apache.org" Content-Type: text/plain; charset=UTF-8 On Mon, Aug 26, 2013 at 9:57 AM, Dave Fisher wrote: > Since some of us like me are on vacation would it possible to either put this into a cwiki page or other clear summary. I have a lot I can add about the current setup like why you lost the rightnav - ie you need a subdirectory in templates. > Great. I was hoping that you in particular would weigh in on this, since I know you have given this a lot of thought as well. I'll continue hacking in the xx directory and start a new thread summarizing the state of the work in a couple weeks. -Rob > Regards, > Dave > > Sent from my iPhone > > On Aug 26, 2013, at 8:45 AM, Rob Weir wrote: > >> On Fri, Aug 23, 2013 at 5:12 PM, janI wrote: >>> On 23 August 2013 21:11, Rob Weir wrote: >>> >>>> On Fri, Aug 23, 2013 at 12:21 PM, janI wrote: >>>>> On 23 August 2013 17:58, Rob Weir wrote: >>>>> >>>>>> (Responses to dev@openoffice.apache.org, please) >>>>>> >>>>>> Obviously our website is quite large. Google reports 21207 pages >>>>>> indexed in the www subdomain, and a further 48075 pages in the wiki >>>>>> subdomain. But for purpose of this post, when I talk about the "home >>>>>> page" I'm talking about the contents of our main index.html and the >>>>>> most commonly visited pages directly linked to it, e.g., the >>>>>> why/download/product/get-involved, etc. pages. >>>>>> >>>>>> This core homepage content amounts to around 25 pages. >>>>>> >>>>>> Today this content is scattered around the content tree. Some of it >>>>>> is in the root. Some of it in /why and /download directories. Some >>>>>> of it is template-related and is in /templates rather than in >>>>>> /content. >>>>>> >>>>>> As a test I tried to create my own NL page, in the fictitious "xx" >>>>>> locale. You can see it here: http://www.openoffice.org/xx/ >>>>>> >>>>>> It is not working correctly, but it already required a lot of >>>>>> non-trivial hacking: >>>>>> >>>>>> 1) I had to hunt around and guess which files to copy. Do I copy >>>>>> scripts, images and CSS, or just content pages? Some of the >>>>>> directories had out-dated content that was not linked to my anyone. >>>>>> It was hard to figure out what the minimum amount of content needed >>>>>> was, and where it was located. >>>>>> >>>>>> 2) The main index.html file had to be edited to refer to CSS in the >>>>>> root, rather than current directory >>>>>> >>>>>> 3) Download page is missed up, missing CSS and/or scripts. >>>>>> Presumably I need to copy something into the xx/download dir, or edit >>>>>> scripts to make them refer /download off the root. >>>>>> >>>>>> 4) The /xx/why pages are not showing the right side navigation now. I >>>>>> must have missed something there as well. >>>>>> >>>>>> Of course, I could figure the above out eventually. It just requires >>>>>> some time and effort and trial and error. But none of this is >>>>>> documented, and even if it were this is a fragile approach and >>>>>> probably beyond th web development skills of a typical translator. >>>>>> >>>>>> But we do know this has been done for some languages. They got it to >>>>>> work. The German page is a good example: >>>>>> >>>>>> http://www.openoffice.org/de/ >>>>>> >>>>>> Now this looks good, but it is still a messy thing from a maintenance >>>>>> perspective. If we make structural changes to the main English page, >>>>>> then those changes need to be manually merged into to every NL page. >>>>>> >>>>>> What can we do to improve this? >>>>>> >>>>>> Here's my idea: >>>>>> >>>>>> 1) What if we refactored the home page so it was all self-contained >>>>>> into these directories: /scripts, /styles, /images and /en/? >>>>>> >>>>>> 2) Make the /en directory be pure content. Only the stuff that needs >>>>>> to be translated. It loads everything else, scripts, images, etc., >>>>>> via URLs relative to the root, e.g.., in /scripts, /styles, etc. >>>>>> >>>>>> 3) Reduce or eliminate any embedded Javascript within pages. For >>>>>> example, refactor the code in download/index.html so it is external >>>>>> and depends on JSON resource files for translated strings. Aim so >>>>>> translators never need to touch script. >>>>>> >>>>>> 4) Ultimate goal is for someone to be able to jump start a new NL home >>>>>> page by simply requesting an svn copy of the /en directory, and then >>>>>> editing the resulting files. No one should ever need to do what I'm >>>>>> doing with the "xx" pages. >>>>>> >>>>>> 5) Maintenance is far easier. Most things like changing the scripts, >>>>>> is done in one place only. But even changes to the HTML are easier. >>>>>> Since we then have a common branch point via the svn copy, when >>>>>> structural changes are added to the main /en HTML, these can be merged >>>>>> in more elegantly to the translated versions, using Subversion. >>>>>> >>>>>> 6) Via Apache redirects we can ensure that the default call to >>>>>> www.openoffice.org/ goes to /en/. Conceivably we could also do locale >>>>>> detection and send requests automatically to the appropriate NL home >>>>>> page. >>>>>> >>>>>> A variation on the above would be to use Pootle, rather than svn >>>>>> copy/merge to maintain the translations. But that would require the >>>>>> same refactoring work to enable it. And it would require further >>>>>> investigation to identify a way of extracting and merging translation >>>>>> strings in MDText files as well as (X)HTML files. >>>>>> >>>>>> This is obviously more than a one-person task. So I'd be interested >>>>>> in hearing what you think in general about this approach, whether >>>>>> there is a simpler alternative I've missed, and whether this is >>>>>> something you'd be interested in helping with. >>>>> >>>>> I like a lot of your ideas, let me add my own experience. >>>> >>>> Thanks. >>>> >>>>> If the our pages do not contain text, but that is totally outsourced in >>>> one >>>>> or more json objects, then translation becomes easy, and the pages >>>> themself >>>>> stay simple. when the url is called without arguments the "en-json" is >>>>> used, and if called with lang="xx" "xx-json" is used. >>>> >>>> I like the idea of content/code separation. We certainly do this is >>>> the code, for example. But two challenges to taking this approach all >>>> the way with the website. >>>> >>>> 1) If we do JSON everywhere then we have a Javscript dependency >>>> everywhere. This has an impact for visibility of the pages to search >>>> engines, but there are workarounds. But it may be a bigger issue for >>>> users who block Javascript. >>>> >>>> No we would do that solely on the server side, it would not be a good idea >>> to have JS retrieve the json objects. >>> >>> We could eg. use php, that retrieved to correct json object, and >>> transmitted a finished page. >> >> OK. We're on the same page. >> >>> >>>> 2) There may be cases where a translation requires direct access to >>>> the HTML or CSS. For example, I think the Tamil translation needed >>>> access to specify a specific font. And for some languages they might >>>> need to set text direction to RTL. These kinds of things make almost >>>> any approach more complicated. >>> >>> Look at e.g. our mwiki that handles those details all on server side. >>> >>> And just as a suggestion, if we were to use wordpress, things like fonts >>> would be solved. WP also have a possibility (not json) for multi language, >>> which I could easy adopt in genLang (for translation). >>> >>> >>>> >>>> So the question we need to answer is how far we take this? I think >>>> we have some examples where the code is so intertwined with the text >>>> that translation becomes very hard and risky. For example, the >>>> generation of the "boxes" on the download page. But then we have >>>> some other pages, especially the MDText pages, where I would be >>>> comfortable handing it directly to a translator and expect they could >>>> edit it without breaking anything. >>> We can always find examples where it becomes hard, but typically you can >>> reformulate the problem so it fits in a standard (boxes are no real >>> problem). The only problem I see is with JS, where are ask and get answers >>> e.g. YN. >>> >>> >>>> >>>> The Javascript dependency might be broken if we make this be a CMS >>>> build-time text replacement rather than a runtime/Javascript >>>> replacement. So the CMS would detect when the Pootle files change and >>>> automatically generate new HTML pages from them. But even then we >>>> still would need some runtime integration of strings, specifically on >>>> the download page where language and OS are determined at runtime >>>> based on browser request headers. >>> I would consider not to use cms, because we basically dont need it. >> >> So within the realm of server-side software, what is possible? My >> impress was that Infra generally cautions against runtime server side >> execution due to the greater opportunity for security problems. >> That's why we're using build-time page generation. This approach also >> performs well, since it is static HTML pages at runtime, and is very >> stable. >> >> But in any case, I think the refactoring work is approximately the >> same thing regardless of how the pages are generated. >> >> -Rob >> >> >>> rgds >>> jan I. >>> >>>> >>>> -Rob >>>> >>>>> If we use json objects, then pootle becomes an elegant tool for >>>>> translation, because it knows how to handle xml, and if we want to stay >>>>> with po files its about 1 day work in genLang. >>>>> >>>>> A number of top companies (incl. the one I used to work for) do it like >>>>> this, they of course then hire a translator to translate the json >>>> objects. >>>>> >>>>> Splitting functionality and text is the key, when thats done the rest is >>>>> trvial work. >>>>> >>>>> This will of course make cms a bit top kill, but I can live with that :-) >>>> >>>> >>>> >>>> >>>>> rgds >>>>> jan I. >>>>> >>>>> >>>>>> >>>>>> Regards, >>>>>> >>>>>> -Rob >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org >>>>>> For additional commands, e-mail: dev-help@openoffice.apache.org >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org >>>> For additional commands, e-mail: dev-help@openoffice.apache.org >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org >> For additional commands, e-mail: dev-help@openoffice.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org > For additional commands, e-mail: dev-help@openoffice.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org For additional commands, e-mail: dev-help@openoffice.apache.org