Mailing-List: contact dev-help@openoffice.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@openoffice.apache.org
MIME-Version: 1.0
In-Reply-To: <308FCCD2-64EB-46C3-9B3F-B43B26570D1F@apache.org>
References: 
 <CAP-ksohCn3SmW_ajP11wvv6tApDNkrmiJOBR56GwEeWd8+5Hpg@mail.gmail.com>
	<CAK2iWdQTmCyBHO+31jduWtPY_xf0HFuNSs_Bn2vrQRP6Ej8Lgw@mail.gmail.com>
	<CAP-ksoi9UWyQdJyXscSk_ObRaP0i2Yd2KNsXhojs6xzT_=QcNw@mail.gmail.com>
	<CAK2iWdS3w-rzXjjPJbknm0vknaBRu4X-Ee5k=FM4vb0Hzfhhsg@mail.gmail.com>
	<CAP-ksojc9E78Go9yKxzuCO03cf54MvjXuakSv1cSP+QaJy0OQw@mail.gmail.com>
	<308FCCD2-64EB-46C3-9B3F-B43B26570D1F@apache.org>
Date: Mon, 26 Aug 2013 10:20:43 -0400
Message-ID: 
 <CAP-ksojuiRcjXQciL6Y027N18gAo-YZKMcEkUcCMhdh-Tqqz4A@mail.gmail.com>
Subject: Re: Brainstorming: Can we refactor the website to make translation
 easier?
From: Rob Weir <robweir@apache.org>
To: "dev@openoffice.apache.org" <dev@openoffice.apache.org>
Content-Type: text/plain; charset=UTF-8

On Mon, Aug 26, 2013 at 9:57 AM, Dave Fisher <wave@apache.org> wrote:
> Since some of us like me are on vacation would it possible to either put this into a cwiki page or other clear summary. I have a lot I can add about the current setup like why you lost the rightnav - ie you need a subdirectory in templates.
>

Great.  I was hoping that you in particular would weigh in on this,
since I know you have given this a lot of thought as well.

I'll continue hacking in the xx directory and start a new thread
summarizing the state of the work in a couple weeks.

-Rob


> Regards,
> Dave
>
> Sent from my iPhone
>
> On Aug 26, 2013, at 8:45 AM, Rob Weir <robweir@apache.org> wrote:
>
>> On Fri, Aug 23, 2013 at 5:12 PM, janI <jani@apache.org> wrote:
>>> On 23 August 2013 21:11, Rob Weir <robweir@apache.org> wrote:
>>>
>>>> On Fri, Aug 23, 2013 at 12:21 PM, janI <jani@apache.org> wrote:
>>>>> On 23 August 2013 17:58, Rob Weir <robweir@apache.org> wrote:
>>>>>
>>>>>> (Responses to dev@openoffice.apache.org, please)
>>>>>>
>>>>>> Obviously our website is quite large.  Google reports 21207 pages
>>>>>> indexed in the www subdomain, and a further 48075 pages in the wiki
>>>>>> subdomain.   But for purpose of this post, when I talk about the "home
>>>>>> page" I'm talking about the contents of our main index.html and the
>>>>>> most commonly visited pages directly linked to it, e.g., the
>>>>>> why/download/product/get-involved, etc. pages.
>>>>>>
>>>>>> This core homepage content amounts to around 25 pages.
>>>>>>
>>>>>> Today this content is scattered around the content tree.  Some of it
>>>>>> is in the root.  Some of it in /why and /download directories.  Some
>>>>>> of it is template-related and is in /templates rather than in
>>>>>> /content.
>>>>>>
>>>>>> As a test I tried to create my own NL page, in the fictitious "xx"
>>>>>> locale.  You can see it here:  http://www.openoffice.org/xx/
>>>>>>
>>>>>> It is not working correctly, but it already required a lot of
>>>>>> non-trivial hacking:
>>>>>>
>>>>>> 1) I had to hunt around and guess which files to copy.  Do I copy
>>>>>> scripts, images and CSS, or just content pages?   Some of the
>>>>>> directories had out-dated content that was not linked to my anyone.
>>>>>> It was hard to figure out what the minimum amount of content needed
>>>>>> was, and where it was located.
>>>>>>
>>>>>> 2) The main index.html file had to be edited to refer to CSS in the
>>>>>> root, rather than current directory
>>>>>>
>>>>>> 3)  Download page is missed up, missing CSS and/or scripts.
>>>>>> Presumably I need to copy something into the xx/download dir, or edit
>>>>>> scripts to make them refer /download off the root.
>>>>>>
>>>>>> 4) The /xx/why pages are not showing the right side navigation now.  I
>>>>>> must have missed something there as well.
>>>>>>
>>>>>> Of course, I could figure the above out eventually.  It just requires
>>>>>> some time and effort and trial and error.  But none of this is
>>>>>> documented, and even if it were this is a fragile approach and
>>>>>> probably beyond th web development skills of a typical translator.
>>>>>>
>>>>>> But we do know this has been done for some languages.  They got it to
>>>>>> work.  The German page is a good example:
>>>>>>
>>>>>> http://www.openoffice.org/de/
>>>>>>
>>>>>> Now this looks good, but it is still a messy thing from a maintenance
>>>>>> perspective.  If we make structural changes to the main English page,
>>>>>> then those changes need to be manually merged into to every NL page.
>>>>>>
>>>>>> What can we do to improve this?
>>>>>>
>>>>>> Here's my idea:
>>>>>>
>>>>>> 1) What if we refactored the home page so it was all self-contained
>>>>>> into these directories:   /scripts,  /styles,  /images and /en/?
>>>>>>
>>>>>> 2) Make the /en directory be pure content.  Only the stuff that needs
>>>>>> to be translated.  It loads everything else, scripts, images, etc.,
>>>>>> via URLs relative to the root, e.g.., in /scripts, /styles, etc.
>>>>>>
>>>>>> 3) Reduce or eliminate any embedded Javascript within pages.  For
>>>>>> example, refactor the code in download/index.html so it is external
>>>>>> and depends on JSON resource files for translated strings.  Aim so
>>>>>> translators never need to touch script.
>>>>>>
>>>>>> 4) Ultimate goal is for someone to be able to jump start a new NL home
>>>>>> page by simply requesting an svn copy of the /en directory, and then
>>>>>> editing the resulting files.  No one should ever need to do what I'm
>>>>>> doing with the "xx" pages.
>>>>>>
>>>>>> 5) Maintenance is far easier.  Most things like changing the scripts,
>>>>>> is done in one place only.  But even changes to the HTML are easier.
>>>>>> Since we then have a common branch point via the svn copy, when
>>>>>> structural changes are added to the main /en HTML, these can be merged
>>>>>> in more elegantly to the translated versions, using Subversion.
>>>>>>
>>>>>> 6) Via Apache redirects we can ensure that the default call to
>>>>>> www.openoffice.org/ goes to /en/.  Conceivably we could also do locale
>>>>>> detection and send requests automatically to the appropriate NL home
>>>>>> page.
>>>>>>
>>>>>> A variation on the above would be to use Pootle, rather than svn
>>>>>> copy/merge to maintain the translations.  But that would require the
>>>>>> same refactoring work to enable it.  And it would require further
>>>>>> investigation to identify a way of extracting and merging translation
>>>>>> strings in MDText files as well as (X)HTML files.
>>>>>>
>>>>>> This is obviously more than a one-person task.  So I'd be interested
>>>>>> in hearing what you think in general about this approach, whether
>>>>>> there is a simpler alternative I've missed, and whether this is
>>>>>> something you'd be interested in helping with.
>>>>>
>>>>> I like a lot of your ideas, let me add my own experience.
>>>>
>>>> Thanks.
>>>>
>>>>> If the our pages do not contain text, but that is totally outsourced in
>>>> one
>>>>> or more json objects, then translation becomes easy, and the pages
>>>> themself
>>>>> stay simple. when the url is called without arguments the "en-json" is
>>>>> used, and if called with lang="xx" "xx-json" is used.
>>>>
>>>> I like the idea of content/code separation.  We certainly do this is
>>>> the code, for example.  But two challenges to taking this approach all
>>>> the way with the  website.
>>>>
>>>> 1) If we do JSON everywhere then we have a Javscript dependency
>>>> everywhere.  This has an impact for visibility of the pages to search
>>>> engines, but there are workarounds.  But it may be a bigger issue for
>>>> users who block Javascript.
>>>>
>>>> No we would do that solely on the server side, it would not be a good idea
>>> to have JS retrieve the json objects.
>>>
>>> We could eg. use php, that retrieved to correct json object, and
>>> transmitted a finished page.
>>
>> OK.  We're on the same page.
>>
>>>
>>>> 2) There may be cases where a translation requires direct access to
>>>> the HTML or CSS.  For example, I think the Tamil translation needed
>>>> access to specify a specific font.  And for some languages they might
>>>> need to set text direction to RTL.   These kinds of things make almost
>>>> any approach more complicated.
>>>
>>> Look at e.g. our mwiki that handles those details all on server side.
>>>
>>> And just as a suggestion, if we were to use wordpress, things like fonts
>>> would be solved. WP also have a possibility (not json) for multi language,
>>> which I could easy adopt in genLang (for translation).
>>>
>>>
>>>>
>>>> So the question we need to answer is how far we take this?   I think
>>>> we have some examples where the code is so intertwined with the text
>>>> that translation becomes very hard and risky.  For example, the
>>>> generation of the "boxes" on the download page.   But then we have
>>>> some other pages, especially the MDText pages, where I would be
>>>> comfortable handing it directly to a translator and expect they could
>>>> edit it without breaking anything.
>>> We can always find examples where it becomes hard, but typically you can
>>> reformulate the problem so it fits in a standard (boxes are no real
>>> problem). The only problem I see is with JS, where are ask and get answers
>>> e.g. YN.
>>>
>>>
>>>>
>>>> The Javascript dependency might be broken if we make this be a CMS
>>>> build-time text replacement rather than a runtime/Javascript
>>>> replacement.  So the CMS would detect when the Pootle files change and
>>>> automatically generate new HTML pages from them.  But even then we
>>>> still would need some runtime integration of strings, specifically on
>>>> the download page where language and OS are determined at runtime
>>>> based on browser request headers.
>>> I would consider not to use cms, because we basically dont need it.
>>
>> So within the realm of server-side software, what is possible?  My
>> impress was that Infra generally cautions against runtime server side
>> execution due to the greater opportunity for security problems.
>> That's why we're using build-time page generation.  This approach also
>> performs well, since it is static HTML pages at runtime, and is very
>> stable.
>>
>> But in any case, I think the refactoring work is approximately the
>> same thing regardless of how the pages are generated.
>>
>> -Rob
>>
>>
>>> rgds
>>> jan I.
>>>
>>>>
>>>> -Rob
>>>>
>>>>> If we use json objects, then pootle becomes an elegant tool for
>>>>> translation, because it knows how to handle xml, and if we want  to stay
>>>>> with po files its about 1 day work in genLang.
>>>>>
>>>>> A number of top companies (incl. the one I used to work for) do it like
>>>>> this, they of course then hire a translator to translate the json
>>>> objects.
>>>>>
>>>>> Splitting functionality and text is the key, when thats done the rest is
>>>>> trvial work.
>>>>>
>>>>> This will of course make cms a bit top kill, but I can live with that :-)
>>>>
>>>>
>>>>
>>>>
>>>>> rgds
>>>>> jan I.
>>>>>
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> -Rob
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
>>>>>> For additional commands, e-mail: dev-help@openoffice.apache.org
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
>>>> For additional commands, e-mail: dev-help@openoffice.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
>> For additional commands, e-mail: dev-help@openoffice.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: dev-help@openoffice.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org