incubator-ooo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Fisher <dave2w...@comcast.net>
Subject Re: Crazy idea: Use Google to translate website
Date Tue, 03 Jul 2012 20:23:19 GMT

On Jul 2, 2012, at 5:49 PM, Rob Weir wrote:

> On Mon, Jul 2, 2012 at 7:18 PM, Dave Fisher <dave2wave@comcast.net> wrote:
>> Sorry for the top post. I like where this is going. A lot of interesting ideas.
>> 
>> I have one major concern. How do we manage the human created content as people replace
and/or edit the translations. What happens when the original English (or French) page is changed?
To me we are really discussing managing Markdown text. If the names of files are like:
>> 
>> index.mdtext
>> index.en-GB.mdtext
>> index.fr.mdtext
>> 

Prefacing my remarks with a general agreement to your ideas, but with exceptions that we should
consider.

> 
> Wouldn't you say that 99% of the website is HTML and Wiki Text today?
> There is very little Markdown in use outside of the Podling project
> pages.

Yes, that is true. However all the markdown created will be AL2 licensed while the html is
mostly something from the TermsOfUse that was discussed in a different thread.

If you are concerned with a clear IP trail, and I know you are, then we should proceed with
Markdown and select HTML.

> 
> In any case, it should be possible to use Pootle for this, just as we
> manage changing product strings and updating those.
> 
> There are a good number of convertors for getting to/from Pootle
> format:  http://translate.sourceforge.net/wiki/toolkit/index
> 
> Note html2po.
> 
> I bet writing mdtext2po (and the inverse) would be possible.
> 
> If we used Pootle for this, we'd need to define some sort of schedule,
> since it is not really a "release" in the traditional sense.  But you
> could imagine every month or so, doing a cycle of:
> 
> 1) html2po and mdtext2po
> 
> 2) Load into Pootle
> 
> 3) Volunteers translate
> 
> 4) At specified time run po2html and po2mdtext
> 
> 5) check in the new website files

This can work for content that does not a quick response. Other files like announcements and
various news sources require a more immediate approach. This same goal is the initial goal
of the Apache CMS. It is to eliminate extraneous process.

I think that process for translations is good and you provide one that will work for most
content.

I would add that the branding and navigation (but not the footer) markdown / ssi's would benefit
from this approach.

> 
>> We'll have some type of Apache CMS magic that can handle translated SSI elements.
I need to write Joe / infra-dev an email...
>> 
>> Then if we can tie together the CMS to take translations and somehow inform either
or both the human and/or the tool translators when changes occur in other languages ... svn
diff can be used... assuming that...
>> 
> 
> The issue is the average translator is not an markup (or markdown)
> person.  They use Pootle or similar tools that facilitate translation.
> What do we need to be translator-friendly?  Consistency between how
> we translate UI and webpages might help.

What we are targeting here is the NL user who notices a problem with a translation and wants
to help. These people may find Pootle to be any more usual than HTML.

Let's not let a Pootle "all encompassing" process break the ability of committers and contributors
to make ad hoc contributions. If the tool you describe is built then it should certainly carry
some svn tags so that merges of updated content from pootle don't overwrite any CMS based
contributions that have been made in the interim. These will be merge conflicts.

As long as your process includes (1) every time then there isn't a problem with that. However
that won't work because the translations in Pootle are where the bulk of the work occurs.

I think there are benefits to both the CMS and the Pootle approaches, more thought will need
to go into the timeline and how to fully leverage the human element between the project's
base content, NL users and L10N Translators. As always our goal is to allow more and more
of the community to be able to easily contribute.

Let's divide up the process as follows. I'm adding the notion of a string table which could
be implemented as a file for each language with:

string.mdtext
homepage: home

string.fr.mdtext
homepage: maison

string.en-GB.mdtext
homepage: home

string.it.mdtext
homepage: casa

(A) Apache CMS - Web Content is Edited  / Website's built.
	Some changes will be via string (sledgehammer ... ) and others by content page.

(My outline)

(B) CMS to Pootle process

> 1) html2po and mdtext2po

	Keep an index of the files included in this set. Perhaps use an attribute (mdtext) or metatag
(html) to self identify translatable files.
	(footer.mdtext must remain in English since it has legal implications and translators are
unlikely to be IP lawyers.)

> 2) Load into Pootle


	Include string table changes made through CMS.
	During load handle any differences caused by conflicts between changes made in (C) since
the last (D)

(C) Pootle Apache Instance

> 3) Volunteers translate

	Committers by name and contributors from po or other files.

(D) Pootle to CMS process

> 4) At specified time run po2html and po2mdtext

	Use the index from (B) of the files included.

> 5) check in the new website files

	During merge handle conflicts between changes made in (A) since (B)

(A) and (C) are continuous.

(B) and (D) can be done in whatever sequence and frequency make sense to the L10N team.

Are we getting to a reasonable framework? There certainly detail work about conversion in
and out, but I do think this is something we can incrementally work towards together.

Regards,
Dave

> 
>> With markdown it will be easy to have a header parameter that will signal the inclusion
of an SSI detailing the machine translated page vs. human translation situation. By making
it an SSI and translatable it can become something different language groups can handle in
an organic way. We'll have an objective measure of the engagement of different language communities
based on the the number of edits, number of translators and how up to date and/or responsive
they are.
>> 
>> I think we could start by creating a test-auto.mdtext file, and using the translate.google
to convert it to 100 pages. Put the scripts in the ooo-site/trunk/tools/ directory. If they
are perl scripts then in ooo-site/lib/.
>> 
>> Regards,
>> Dave
>> 
>> On Jul 2, 2012, at 2:43 PM, Kay Schenk wrote:
>> 
>>> On Mon, Jul 2, 2012 at 2:27 PM, Rob Weir <robweir@apache.org> wrote:
>>> 
>>>> On Mon, Jul 2, 2012 at 4:20 PM, Kay Schenk <kay.schenk@gmail.com> wrote:
>>>>> On Mon, Jul 2, 2012 at 7:14 AM, Rob Weir <robweir@apache.org> wrote:
>>>>> 
>>>>>> On Mon, Jul 2, 2012 at 9:57 AM, Donald Whytock <dwhytock@gmail.com>
>>>> wrote:
>>>>>>> You don't have to use Google Translate for the entire site into
a
>>>>>>> given language.  Better than no page at all in a given language
is a
>>>>>> 
>>>>>> True.   To enable this integration requires adding markup to two
>>>>>> places in the HTML file:
>>>>>> 
>>>>>> 1) Load some script in the <head> section
>>>>>> 
>>>>>> 2) Add a Google-provided <div> to wherever in the page we want
the
>>>>>> language selector drop down to be.
>>>>>> 
>>>>>> It would be really easy to add this to a small number of selected
pages.
>>>>>> 
>>>>>> It would also be easy to add to all pages via the CMS template.
>>>>>> 
>>>>>> What would be hard is managing this for a large number of pages,
but
>>>>>> not all pages.
>>>>>> 
>>>>>>> page in a given language that says, "Hi there!  This is the site
for
>>>>>>> Apache OpenOffice.  We welcome translations of our site into
your
>>>>>>> language, and invite you to volunteer at the following email
address:
>>>>>>> <blah> Or you can submit a translation through Google Translate,
which
>>>>>>> was used to produce this page."
>>>>>>> 
>>>>>>> Something as short as that is less likely to be garbled in
>>>>>>> auto-translation than something technical, and it tells potential
>>>>>>> contributors what to do to help out.
>>>>>>> 
>>>>>> 
>>>>>> The trick would be to get people to visit that page.  Unless it was
on
>>>>>> the home page.
>>>>>> 
>>>>>> -Rob
>>>>>> 
>>>>>>> Don
>>>>>> 
>>>>> 
>>>>> OK, it took me a little while to weed through Google's info on this.
>>>>> 
>>>>> A good sample can be found at:
>>>>> 
>>>>> 
>>>> http://googleblog.blogspot.com/2009/09/translate-your-website-with-google.html
>>>>> 
>>>>> Is there any possibility we could ad the gadget to the OOo blogs site
--
>>>>> 
>>>>> https://blogs.apache.org/OOo/
>>>>> 
>>>>> just for fun and see what we think?
>>>>> This way we'd just be impacting one page and not a whole site.
>>>>> 
>>>> 
>>>> If we want access to review and approve suggestions made by readers
>>>> then it needs to be on a domain that we "own".  This is in common with
>>>> most Google services, you need to demonstrate that you control the
>>>> domain, typically by adding a special META tag to the homepage.  For
>>>> *.openoffice.org this is easy, and I've already done this to enable
>>>> Google Analytics.  If we want to do the same for the blog we'd need
>>>> the ability to insert special markup into the <head> and <body>
of the
>>>> blog template.  I'm not sure whether this is possible with our Roller
>>>> setup.
>>>> 
>>> 
>>> oh -- well too bad. It could have been fun.
>>> 
>>> 
>>>> 
>>>> Another way of testing this, in a quantitative way, is via what is
>>>> called "A/B Testing".  With this approach we define an action a
>>>> satisfied site visitor might take, like downloading AOO 3.4.  Then we
>>>> randomly show users either the original home page (or download page or
>>>> any other page we're testing).  This is "A", and then we show other
>>>> users a different version, B.  For example, B could have the
>>>> translation enabled.  Then we ran this "experiment" for a period of
>>>> time, like a week or two, tracking which version of the page has the
>>>> higher success rate with users.
>>>> 
>>> 
>>> hmmmm...interesting
>>> 
>>> OK, I've looked at the rest of your post here and will think about this for
>>> a bit.
>>> 
>>> 
>>>> 
>>>> If the machine translated page leads visitors confuses users, or makes
>>>> them suspect the page, then the download %'s will be lower than the
>>>> original page.  And if the translated page is helpful then the
>>>> download numbers would be higher.
>>>> 
>>>> You could imagine other success indicators.  Pretty much anything that
>>>> has a URL can be measured.   For example, imagine we add a link, "This
>>>> page solved my problem" to the bottom of every documentation page.
>>>> Even though the link would just go to a "thanks" page, we could use
>>>> that action to measure the success of translated versus untranslated
>>>> pages.
>>>> 
>>>> Of course, we don't need to do this all at once.  But I'd recommend we
>>>> think of ways of quantifying success.  The website serves our users.
>>>> How do we know what is working well and what isn't?  How can we design
>>>> experiments to test alternative approaches?
>>>> 
>>>> 
>>>> Possible successes for users might be:
>>>> 
>>>> - downloaded AOO
>>>> 
>>>> - found answer to their question
>>>> 
>>>> - signed up for our announcement list
>>>> 
>>>> - entered their first bug report
>>>> 
>>>> - signed up for one of the project lists
>>>> 
>>>> - make first wiki contribution
>>>> 
>>>> - followed/liked/+1'ed us on one of our social networking sites
>>>> 
>>>> Measure, improve, repeat.   Constant improvement and optimization.
>>>> 
>>>> We can debate what will improve the website for the users.  Or we can
>>>> test and measure.  A/B testing is a new option for us, a technique
>>>> that once was used only by the largest commercial websites, but is now
>>>> available to everyone via Google's "content experiments" support in
>>>> Google Analytics.
>>>> 
>>>> -Rob
>>>> 
>>>>> I think that might a perfect application for something like this.
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>> ----------------------------------------------------------------------------------------
>>>>> MzK
>>>>> 
>>>>> "I would rather have a donkey that takes me there
>>>>> than a horse that will not fare."
>>>>>                                         -- Portuguese proverb
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> ----------------------------------------------------------------------------------------
>>> MzK
>>> 
>>> "I would rather have a donkey that takes me there
>>> than a horse that will not fare."
>>>                                         -- Portuguese proverb
>> 


Mime
View raw message