incubator-ooo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kay Schenk <kay.sch...@gmail.com>
Subject Re: Crazy idea: Use Google to translate website
Date Tue, 03 Jul 2012 20:58:30 GMT
On Tue, Jul 3, 2012 at 1:23 PM, Dave Fisher <dave2wave@comcast.net> wrote:

>
> On Jul 2, 2012, at 5:49 PM, Rob Weir wrote:
>
> > On Mon, Jul 2, 2012 at 7:18 PM, Dave Fisher <dave2wave@comcast.net>
> wrote:
> >> Sorry for the top post. I like where this is going. A lot of
> interesting ideas.
> >>
> >> I have one major concern. How do we manage the human created content as
> people replace and/or edit the translations. What happens when the original
> English (or French) page is changed? To me we are really discussing
> managing Markdown text. If the names of files are like:
> >>
> >> index.mdtext
> >> index.en-GB.mdtext
> >> index.fr.mdtext
> >>
>
> Prefacing my remarks with a general agreement to your ideas, but with
> exceptions that we should consider.
>
> >
> > Wouldn't you say that 99% of the website is HTML and Wiki Text today?
> > There is very little Markdown in use outside of the Podling project
> > pages.
>
> Yes, that is true. However all the markdown created will be AL2 licensed
> while the html is mostly something from the TermsOfUse that was discussed
> in a different thread.
>
> If you are concerned with a clear IP trail, and I know you are, then we
> should proceed with Markdown and select HTML.
>
> >
> > In any case, it should be possible to use Pootle for this, just as we
> > manage changing product strings and updating those.
> >
> > There are a good number of convertors for getting to/from Pootle
> > format:  http://translate.sourceforge.net/wiki/toolkit/index
> >
> > Note html2po.
> >
> > I bet writing mdtext2po (and the inverse) would be possible.
> >
> > If we used Pootle for this, we'd need to define some sort of schedule,
> > since it is not really a "release" in the traditional sense.  But you
> > could imagine every month or so, doing a cycle of:
> >
> > 1) html2po and mdtext2po
> >
> > 2) Load into Pootle
> >
> > 3) Volunteers translate
> >
> > 4) At specified time run po2html and po2mdtext
> >
> > 5) check in the new website files
>
> This can work for content that does not a quick response. Other files like
> announcements and various news sources require a more immediate approach.
> This same goal is the initial goal of the Apache CMS. It is to eliminate
> extraneous process.
>
> I think that process for translations is good and you provide one that
> will work for most content.
>
> I would add that the branding and navigation (but not the footer) markdown
> / ssi's would benefit from this approach.
>
> >
> >> We'll have some type of Apache CMS magic that can handle translated SSI
> elements. I need to write Joe / infra-dev an email...
> >>
> >> Then if we can tie together the CMS to take translations and somehow
> inform either or both the human and/or the tool translators when changes
> occur in other languages ... svn diff can be used... assuming that...
> >>
> >
> > The issue is the average translator is not an markup (or markdown)
> > person.  They use Pootle or similar tools that facilitate translation.
> > What do we need to be translator-friendly?  Consistency between how
> > we translate UI and webpages might help.
>
> What we are targeting here is the NL user who notices a problem with a
> translation and wants to help. These people may find Pootle to be any more
> usual than HTML.
>
> Let's not let a Pootle "all encompassing" process break the ability of
> committers and contributors to make ad hoc contributions. If the tool you
> describe is built then it should certainly carry some svn tags so that
> merges of updated content from pootle don't overwrite any CMS based
> contributions that have been made in the interim. These will be merge
> conflicts.
>
> As long as your process includes (1) every time then there isn't a problem
> with that. However that won't work because the translations in Pootle are
> where the bulk of the work occurs.
>
> I think there are benefits to both the CMS and the Pootle approaches, more
> thought will need to go into the timeline and how to fully leverage the
> human element between the project's base content, NL users and L10N
> Translators. As always our goal is to allow more and more of the community
> to be able to easily contribute.
>
> Let's divide up the process as follows. I'm adding the notion of a string
> table which could be implemented as a file for each language with:
>
> string.mdtext
> homepage: home
>
> string.fr.mdtext
> homepage: maison
>
> string.en-GB.mdtext
> homepage: home
>
> string.it.mdtext
> homepage: casa
>
> (A) Apache CMS - Web Content is Edited  / Website's built.
>         Some changes will be via string (sledgehammer ... ) and others by
> content page.
>
> (My outline)
>
> (B) CMS to Pootle process
>
> > 1) html2po and mdtext2po
>
>         Keep an index of the files included in this set. Perhaps use an
> attribute (mdtext) or metatag (html) to self identify translatable files.
>         (footer.mdtext must remain in English since it has legal
> implications and translators are unlikely to be IP lawyers.)
>
> > 2) Load into Pootle
>
>
>         Include string table changes made through CMS.
>         During load handle any differences caused by conflicts between
> changes made in (C) since the last (D)
>
> (C) Pootle Apache Instance
>
> > 3) Volunteers translate
>
>         Committers by name and contributors from po or other files.
>
> (D) Pootle to CMS process
>
> > 4) At specified time run po2html and po2mdtext
>
>         Use the index from (B) of the files included.
>
> > 5) check in the new website files
>
>         During merge handle conflicts between changes made in (A) since (B)
>
> (A) and (C) are continuous.
>
> (B) and (D) can be done in whatever sequence and frequency make sense to
> the L10N team.
>
> Are we getting to a reasonable framework? There certainly detail work
> about conversion in and out, but I do think this is something we can
> incrementally work towards together.
>
> Regards,
> Dave
>
> >
> >> With markdown it will be easy to have a header parameter that will
> signal the inclusion of an SSI detailing the machine translated page vs.
> human translation situation. By making it an SSI and translatable it can
> become something different language groups can handle in an organic way.
> We'll have an objective measure of the engagement of different language
> communities based on the the number of edits, number of translators and how
> up to date and/or responsive they are.
> >>
> >> I think we could start by creating a test-auto.mdtext file, and using
> the translate.google to convert it to 100 pages. Put the scripts in the
> ooo-site/trunk/tools/ directory. If they are perl scripts then in
> ooo-site/lib/.
> >>
> >> Regards,
> >> Dave
> >>
> >> On Jul 2, 2012, at 2:43 PM, Kay Schenk wrote:
> >>
> >>> On Mon, Jul 2, 2012 at 2:27 PM, Rob Weir <robweir@apache.org> wrote:
> >>>
> >>>> On Mon, Jul 2, 2012 at 4:20 PM, Kay Schenk <kay.schenk@gmail.com>
> wrote:
> >>>>> On Mon, Jul 2, 2012 at 7:14 AM, Rob Weir <robweir@apache.org>
wrote:
> >>>>>
> >>>>>> On Mon, Jul 2, 2012 at 9:57 AM, Donald Whytock <dwhytock@gmail.com>
> >>>> wrote:
> >>>>>>> You don't have to use Google Translate for the entire site
into a
> >>>>>>> given language.  Better than no page at all in a given language
is
> a
> >>>>>>
> >>>>>> True.   To enable this integration requires adding markup to
two
> >>>>>> places in the HTML file:
> >>>>>>
> >>>>>> 1) Load some script in the <head> section
> >>>>>>
> >>>>>> 2) Add a Google-provided <div> to wherever in the page
we want the
> >>>>>> language selector drop down to be.
> >>>>>>
> >>>>>> It would be really easy to add this to a small number of selected
> pages.
> >>>>>>
> >>>>>> It would also be easy to add to all pages via the CMS template.
> >>>>>>
> >>>>>> What would be hard is managing this for a large number of pages,
but
> >>>>>> not all pages.
> >>>>>>
> >>>>>>> page in a given language that says, "Hi there!  This is
the site
> for
> >>>>>>> Apache OpenOffice.  We welcome translations of our site
into your
> >>>>>>> language, and invite you to volunteer at the following email
> address:
> >>>>>>> <blah> Or you can submit a translation through Google
Translate,
> which
> >>>>>>> was used to produce this page."
> >>>>>>>
> >>>>>>> Something as short as that is less likely to be garbled
in
> >>>>>>> auto-translation than something technical, and it tells
potential
> >>>>>>> contributors what to do to help out.
> >>>>>>>
> >>>>>>
> >>>>>> The trick would be to get people to visit that page.  Unless
it was
> on
> >>>>>> the home page.
> >>>>>>
> >>>>>> -Rob
> >>>>>>
> >>>>>>> Don
> >>>>>>
> >>>>>
> >>>>> OK, it took me a little while to weed through Google's info on this.
> >>>>>
> >>>>> A good sample can be found at:
> >>>>>
> >>>>>
> >>>>
> http://googleblog.blogspot.com/2009/09/translate-your-website-with-google.html
> >>>>>
> >>>>> Is there any possibility we could ad the gadget to the OOo blogs
> site --
> >>>>>
> >>>>> https://blogs.apache.org/OOo/
> >>>>>
> >>>>> just for fun and see what we think?
> >>>>> This way we'd just be impacting one page and not a whole site.
> >>>>>
> >>>>
> >>>> If we want access to review and approve suggestions made by readers
> >>>> then it needs to be on a domain that we "own".  This is in common with
> >>>> most Google services, you need to demonstrate that you control the
> >>>> domain, typically by adding a special META tag to the homepage.  For
> >>>> *.openoffice.org this is easy, and I've already done this to enable
> >>>> Google Analytics.  If we want to do the same for the blog we'd need
> >>>> the ability to insert special markup into the <head> and <body>
of the
> >>>> blog template.  I'm not sure whether this is possible with our Roller
> >>>> setup.
> >>>>
> >>>
> >>> oh -- well too bad. It could have been fun.
> >>>
> >>>
> >>>>
> >>>> Another way of testing this, in a quantitative way, is via what is
> >>>> called "A/B Testing".  With this approach we define an action a
> >>>> satisfied site visitor might take, like downloading AOO 3.4.  Then we
> >>>> randomly show users either the original home page (or download page
or
> >>>> any other page we're testing).  This is "A", and then we show other
> >>>> users a different version, B.  For example, B could have the
> >>>> translation enabled.  Then we ran this "experiment" for a period of
> >>>> time, like a week or two, tracking which version of the page has the
> >>>> higher success rate with users.
> >>>>
> >>>
> >>> hmmmm...interesting
> >>>
> >>> OK, I've looked at the rest of your post here and will think about
> this for
> >>> a bit.
> >>>
> >>>
> >>>>
> >>>> If the machine translated page leads visitors confuses users, or makes
> >>>> them suspect the page, then the download %'s will be lower than the
> >>>> original page.  And if the translated page is helpful then the
> >>>> download numbers would be higher.
> >>>>
> >>>> You could imagine other success indicators.  Pretty much anything that
> >>>> has a URL can be measured.   For example, imagine we add a link, "This
> >>>> page solved my problem" to the bottom of every documentation page.
> >>>> Even though the link would just go to a "thanks" page, we could use
> >>>> that action to measure the success of translated versus untranslated
> >>>> pages.
> >>>>
> >>>> Of course, we don't need to do this all at once.  But I'd recommend
we
> >>>> think of ways of quantifying success.  The website serves our users.
> >>>> How do we know what is working well and what isn't?  How can we design
> >>>> experiments to test alternative approaches?
> >>>>
> >>>>
> >>>> Possible successes for users might be:
> >>>>
> >>>> - downloaded AOO
> >>>>
> >>>> - found answer to their question
> >>>>
> >>>> - signed up for our announcement list
> >>>>
> >>>> - entered their first bug report
> >>>>
> >>>> - signed up for one of the project lists
> >>>>
> >>>> - make first wiki contribution
> >>>>
> >>>> - followed/liked/+1'ed us on one of our social networking sites
> >>>>
> >>>> Measure, improve, repeat.   Constant improvement and optimization.
> >>>>
> >>>> We can debate what will improve the website for the users.  Or we can
> >>>> test and measure.  A/B testing is a new option for us, a technique
> >>>> that once was used only by the largest commercial websites, but is now
> >>>> available to everyone via Google's "content experiments" support in
> >>>> Google Analytics.
> >>>>
> >>>> -Rob
> >>>>
> >>>>> I think that might a perfect application for something like this.
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>
> ----------------------------------------------------------------------------------------
> >>>>> MzK
> >>>>>
> >>>>> "I would rather have a donkey that takes me there
> >>>>> than a horse that will not fare."
> >>>>>                                         -- Portuguese proverb
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>>
> ----------------------------------------------------------------------------------------
> >>> MzK
> >>>
> >>> "I would rather have a donkey that takes me there
> >>> than a horse that will not fare."
> >>>                                         -- Portuguese proverb
> >>
>
>
more info on options...

this is an old article, but...

http://www.labnol.org/internet/google-translation-widgets/10135/

as an FYI, the translate "gadget" is still available:

http://www.gstatic.com/

is JS based and would be a good tool to use for stuff like "announcements"
etc if we wanted to send them out as HTML instead of text.

I guess this is what I was thinking about when I mentioned the blog
articles. This can be applied on a page by page basis. I have NO idea how
good it is though.

I have no experience with the "product" Rob originally mentioned --

https://translate.google.com/manager/

but since the setup seems to be "bulk" and the mdtext pages actually do get
translated to HTML before display, well, I would imagine the licensing
would get translated also?

So, Rob, in summary, I imagine you were thinking of this for the "english"
portions of our existing site(s) -- NOT the NL areas, correct?

We would need to find out how "tailorable" this is -- maybe we could
exclude areas.


-- 
----------------------------------------------------------------------------------------
MzK

"I would rather have a donkey that takes me there
 than a horse that will not fare."
                                          -- Portuguese proverb

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message