openoffice-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Weir <robw...@apache.org>
Subject Re: Brainstorming: Can we refactor the website to make translation easier?
Date Fri, 23 Aug 2013 19:11:26 GMT
On Fri, Aug 23, 2013 at 12:21 PM, janI <jani@apache.org> wrote:
> On 23 August 2013 17:58, Rob Weir <robweir@apache.org> wrote:
>
>> (Responses to dev@openoffice.apache.org, please)
>>
>> Obviously our website is quite large.  Google reports 21207 pages
>> indexed in the www subdomain, and a further 48075 pages in the wiki
>> subdomain.   But for purpose of this post, when I talk about the "home
>> page" I'm talking about the contents of our main index.html and the
>> most commonly visited pages directly linked to it, e.g., the
>> why/download/product/get-involved, etc. pages.
>>
>> This core homepage content amounts to around 25 pages.
>>
>> Today this content is scattered around the content tree.  Some of it
>> is in the root.  Some of it in /why and /download directories.  Some
>> of it is template-related and is in /templates rather than in
>> /content.
>>
>> As a test I tried to create my own NL page, in the fictitious "xx"
>> locale.  You can see it here:  http://www.openoffice.org/xx/
>>
>> It is not working correctly, but it already required a lot of
>> non-trivial hacking:
>>
>> 1) I had to hunt around and guess which files to copy.  Do I copy
>> scripts, images and CSS, or just content pages?   Some of the
>> directories had out-dated content that was not linked to my anyone.
>> It was hard to figure out what the minimum amount of content needed
>> was, and where it was located.
>>
>> 2) The main index.html file had to be edited to refer to CSS in the
>> root, rather than current directory
>>
>> 3)  Download page is missed up, missing CSS and/or scripts.
>> Presumably I need to copy something into the xx/download dir, or edit
>> scripts to make them refer /download off the root.
>>
>> 4) The /xx/why pages are not showing the right side navigation now.  I
>> must have missed something there as well.
>>
>> Of course, I could figure the above out eventually.  It just requires
>> some time and effort and trial and error.  But none of this is
>> documented, and even if it were this is a fragile approach and
>> probably beyond th web development skills of a typical translator.
>>
>> But we do know this has been done for some languages.  They got it to
>> work.  The German page is a good example:
>>
>> http://www.openoffice.org/de/
>>
>> Now this looks good, but it is still a messy thing from a maintenance
>> perspective.  If we make structural changes to the main English page,
>> then those changes need to be manually merged into to every NL page.
>>
>> What can we do to improve this?
>>
>> Here's my idea:
>>
>> 1) What if we refactored the home page so it was all self-contained
>> into these directories:   /scripts,  /styles,  /images and /en/?
>>
>> 2) Make the /en directory be pure content.  Only the stuff that needs
>> to be translated.  It loads everything else, scripts, images, etc.,
>> via URLs relative to the root, e.g.., in /scripts, /styles, etc.
>>
>> 3) Reduce or eliminate any embedded Javascript within pages.  For
>> example, refactor the code in download/index.html so it is external
>> and depends on JSON resource files for translated strings.  Aim so
>> translators never need to touch script.
>>
>> 4) Ultimate goal is for someone to be able to jump start a new NL home
>> page by simply requesting an svn copy of the /en directory, and then
>> editing the resulting files.  No one should ever need to do what I'm
>> doing with the "xx" pages.
>>
>> 5) Maintenance is far easier.  Most things like changing the scripts,
>> is done in one place only.  But even changes to the HTML are easier.
>> Since we then have a common branch point via the svn copy, when
>> structural changes are added to the main /en HTML, these can be merged
>> in more elegantly to the translated versions, using Subversion.
>>
>> 6) Via Apache redirects we can ensure that the default call to
>> www.openoffice.org/ goes to /en/.  Conceivably we could also do locale
>> detection and send requests automatically to the appropriate NL home
>> page.
>>
>> A variation on the above would be to use Pootle, rather than svn
>> copy/merge to maintain the translations.  But that would require the
>> same refactoring work to enable it.  And it would require further
>> investigation to identify a way of extracting and merging translation
>> strings in MDText files as well as (X)HTML files.
>>
>> This is obviously more than a one-person task.  So I'd be interested
>> in hearing what you think in general about this approach, whether
>> there is a simpler alternative I've missed, and whether this is
>> something you'd be interested in helping with.
>>
>
> I like a lot of your ideas, let me add my own experience.
>

Thanks.

> If the our pages do not contain text, but that is totally outsourced in one
> or more json objects, then translation becomes easy, and the pages themself
> stay simple. when the url is called without arguments the "en-json" is
> used, and if called with lang="xx" "xx-json" is used.
>

I like the idea of content/code separation.  We certainly do this is
the code, for example.  But two challenges to taking this approach all
the way with the  website.

1) If we do JSON everywhere then we have a Javscript dependency
everywhere.  This has an impact for visibility of the pages to search
engines, but there are workarounds.  But it may be a bigger issue for
users who block Javascript.

2) There may be cases where a translation requires direct access to
the HTML or CSS.  For example, I think the Tamil translation needed
access to specify a specific font.  And for some languages they might
need to set text direction to RTL.   These kinds of things make almost
any approach more complicated.

So the question we need to answer is how far we take this?   I think
we have some examples where the code is so intertwined with the text
that translation becomes very hard and risky.  For example, the
generation of the "boxes" on the download page.   But then we have
some other pages, especially the MDText pages, where I would be
comfortable handing it directly to a translator and expect they could
edit it without breaking anything.

The Javascript dependency might be broken if we make this be a CMS
build-time text replacement rather than a runtime/Javascript
replacement.  So the CMS would detect when the Pootle files change and
automatically generate new HTML pages from them.  But even then we
still would need some runtime integration of strings, specifically on
the download page where language and OS are determined at runtime
based on browser request headers.

-Rob

> If we use json objects, then pootle becomes an elegant tool for
> translation, because it knows how to handle xml, and if we want  to stay
> with po files its about 1 day work in genLang.
>
> A number of top companies (incl. the one I used to work for) do it like
> this, they of course then hire a translator to translate the json objects.
>
> Splitting functionality and text is the key, when thats done the rest is
> trvial work.
>
> This will of course make cms a bit top kill, but I can live with that :-)
>




> rgds
> jan I.
>
>
>>
>> Regards,
>>
>> -Rob
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
>> For additional commands, e-mail: dev-help@openoffice.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org


Mime
View raw message