Return-Path: X-Original-To: apmail-incubator-ooo-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-ooo-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E5E56C119 for ; Tue, 3 Jul 2012 20:28:48 +0000 (UTC) Received: (qmail 20599 invoked by uid 500); 3 Jul 2012 20:28:48 -0000 Delivered-To: apmail-incubator-ooo-dev-archive@incubator.apache.org Received: (qmail 20531 invoked by uid 500); 3 Jul 2012 20:28:48 -0000 Mailing-List: contact ooo-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: ooo-dev@incubator.apache.org Delivered-To: mailing list ooo-dev@incubator.apache.org Received: (qmail 20523 invoked by uid 99); 3 Jul 2012 20:28:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jul 2012 20:28:48 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dave2wave@comcast.net designates 76.96.30.32 as permitted sender) Received: from [76.96.30.32] (HELO qmta03.emeryville.ca.mail.comcast.net) (76.96.30.32) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jul 2012 20:28:40 +0000 Received: from omta24.emeryville.ca.mail.comcast.net ([76.96.30.92]) by qmta03.emeryville.ca.mail.comcast.net with comcast id VvhZ1j0041zF43QA3wULbM; Tue, 03 Jul 2012 20:28:20 +0000 Received: from [192.168.1.2] ([67.180.51.144]) by omta24.emeryville.ca.mail.comcast.net with comcast id VwPK1j01L36gVt78kwPKeS; Tue, 03 Jul 2012 20:23:20 +0000 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: Crazy idea: Use Google to translate website From: Dave Fisher In-Reply-To: Date: Tue, 3 Jul 2012 13:23:19 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4FF0ED0B.9090304@apache.org> <19721087.35097046.1341224025928.JavaMail.root@zimbra60-e10.priv.proxad.net> <07715CC0-EC9E-40F9-8B2F-DAF5F8EAA889@comcast.net> To: ooo-dev@incubator.apache.org X-Mailer: Apple Mail (2.1084) X-Virus-Checked: Checked by ClamAV on apache.org On Jul 2, 2012, at 5:49 PM, Rob Weir wrote: > On Mon, Jul 2, 2012 at 7:18 PM, Dave Fisher = wrote: >> Sorry for the top post. I like where this is going. A lot of = interesting ideas. >>=20 >> I have one major concern. How do we manage the human created content = as people replace and/or edit the translations. What happens when the = original English (or French) page is changed? To me we are really = discussing managing Markdown text. If the names of files are like: >>=20 >> index.mdtext >> index.en-GB.mdtext >> index.fr.mdtext >>=20 Prefacing my remarks with a general agreement to your ideas, but with = exceptions that we should consider. >=20 > Wouldn't you say that 99% of the website is HTML and Wiki Text today? > There is very little Markdown in use outside of the Podling project > pages. Yes, that is true. However all the markdown created will be AL2 licensed = while the html is mostly something from the TermsOfUse that was = discussed in a different thread. If you are concerned with a clear IP trail, and I know you are, then we = should proceed with Markdown and select HTML. >=20 > In any case, it should be possible to use Pootle for this, just as we > manage changing product strings and updating those. >=20 > There are a good number of convertors for getting to/from Pootle > format: http://translate.sourceforge.net/wiki/toolkit/index >=20 > Note html2po. >=20 > I bet writing mdtext2po (and the inverse) would be possible. >=20 > If we used Pootle for this, we'd need to define some sort of schedule, > since it is not really a "release" in the traditional sense. But you > could imagine every month or so, doing a cycle of: >=20 > 1) html2po and mdtext2po >=20 > 2) Load into Pootle >=20 > 3) Volunteers translate >=20 > 4) At specified time run po2html and po2mdtext >=20 > 5) check in the new website files This can work for content that does not a quick response. Other files = like announcements and various news sources require a more immediate = approach. This same goal is the initial goal of the Apache CMS. It is to = eliminate extraneous process. I think that process for translations is good and you provide one that = will work for most content. I would add that the branding and navigation (but not the footer) = markdown / ssi's would benefit from this approach. >=20 >> We'll have some type of Apache CMS magic that can handle translated = SSI elements. I need to write Joe / infra-dev an email... >>=20 >> Then if we can tie together the CMS to take translations and somehow = inform either or both the human and/or the tool translators when changes = occur in other languages ... svn diff can be used... assuming that... >>=20 >=20 > The issue is the average translator is not an markup (or markdown) > person. They use Pootle or similar tools that facilitate translation. > What do we need to be translator-friendly? Consistency between how > we translate UI and webpages might help. What we are targeting here is the NL user who notices a problem with a = translation and wants to help. These people may find Pootle to be any = more usual than HTML. Let's not let a Pootle "all encompassing" process break the ability of = committers and contributors to make ad hoc contributions. If the tool = you describe is built then it should certainly carry some svn tags so = that merges of updated content from pootle don't overwrite any CMS based = contributions that have been made in the interim. These will be merge = conflicts. As long as your process includes (1) every time then there isn't a = problem with that. However that won't work because the translations in = Pootle are where the bulk of the work occurs. I think there are benefits to both the CMS and the Pootle approaches, = more thought will need to go into the timeline and how to fully leverage = the human element between the project's base content, NL users and L10N = Translators. As always our goal is to allow more and more of the = community to be able to easily contribute. Let's divide up the process as follows. I'm adding the notion of a = string table which could be implemented as a file for each language = with: string.mdtext homepage: home string.fr.mdtext homepage: maison string.en-GB.mdtext homepage: home string.it.mdtext homepage: casa (A) Apache CMS - Web Content is Edited / Website's built. Some changes will be via string (sledgehammer ... ) and others = by content page. (My outline) (B) CMS to Pootle process > 1) html2po and mdtext2po Keep an index of the files included in this set. Perhaps use an = attribute (mdtext) or metatag (html) to self identify translatable = files. (footer.mdtext must remain in English since it has legal = implications and translators are unlikely to be IP lawyers.) > 2) Load into Pootle Include string table changes made through CMS. During load handle any differences caused by conflicts between = changes made in (C) since the last (D) (C) Pootle Apache Instance > 3) Volunteers translate Committers by name and contributors from po or other files. (D) Pootle to CMS process > 4) At specified time run po2html and po2mdtext Use the index from (B) of the files included. > 5) check in the new website files During merge handle conflicts between changes made in (A) since = (B) (A) and (C) are continuous. (B) and (D) can be done in whatever sequence and frequency make sense to = the L10N team. Are we getting to a reasonable framework? There certainly detail work = about conversion in and out, but I do think this is something we can = incrementally work towards together. Regards, Dave >=20 >> With markdown it will be easy to have a header parameter that will = signal the inclusion of an SSI detailing the machine translated page vs. = human translation situation. By making it an SSI and translatable it can = become something different language groups can handle in an organic way. = We'll have an objective measure of the engagement of different language = communities based on the the number of edits, number of translators and = how up to date and/or responsive they are. >>=20 >> I think we could start by creating a test-auto.mdtext file, and using = the translate.google to convert it to 100 pages. Put the scripts in the = ooo-site/trunk/tools/ directory. If they are perl scripts then in = ooo-site/lib/. >>=20 >> Regards, >> Dave >>=20 >> On Jul 2, 2012, at 2:43 PM, Kay Schenk wrote: >>=20 >>> On Mon, Jul 2, 2012 at 2:27 PM, Rob Weir wrote: >>>=20 >>>> On Mon, Jul 2, 2012 at 4:20 PM, Kay Schenk = wrote: >>>>> On Mon, Jul 2, 2012 at 7:14 AM, Rob Weir = wrote: >>>>>=20 >>>>>> On Mon, Jul 2, 2012 at 9:57 AM, Donald Whytock = >>>> wrote: >>>>>>> You don't have to use Google Translate for the entire site into = a >>>>>>> given language. Better than no page at all in a given language = is a >>>>>>=20 >>>>>> True. To enable this integration requires adding markup to two >>>>>> places in the HTML file: >>>>>>=20 >>>>>> 1) Load some script in the section >>>>>>=20 >>>>>> 2) Add a Google-provided
to wherever in the page we want = the >>>>>> language selector drop down to be. >>>>>>=20 >>>>>> It would be really easy to add this to a small number of selected = pages. >>>>>>=20 >>>>>> It would also be easy to add to all pages via the CMS template. >>>>>>=20 >>>>>> What would be hard is managing this for a large number of pages, = but >>>>>> not all pages. >>>>>>=20 >>>>>>> page in a given language that says, "Hi there! This is the site = for >>>>>>> Apache OpenOffice. We welcome translations of our site into = your >>>>>>> language, and invite you to volunteer at the following email = address: >>>>>>> Or you can submit a translation through Google Translate, = which >>>>>>> was used to produce this page." >>>>>>>=20 >>>>>>> Something as short as that is less likely to be garbled in >>>>>>> auto-translation than something technical, and it tells = potential >>>>>>> contributors what to do to help out. >>>>>>>=20 >>>>>>=20 >>>>>> The trick would be to get people to visit that page. Unless it = was on >>>>>> the home page. >>>>>>=20 >>>>>> -Rob >>>>>>=20 >>>>>>> Don >>>>>>=20 >>>>>=20 >>>>> OK, it took me a little while to weed through Google's info on = this. >>>>>=20 >>>>> A good sample can be found at: >>>>>=20 >>>>>=20 >>>> = http://googleblog.blogspot.com/2009/09/translate-your-website-with-google.= html >>>>>=20 >>>>> Is there any possibility we could ad the gadget to the OOo blogs = site -- >>>>>=20 >>>>> https://blogs.apache.org/OOo/ >>>>>=20 >>>>> just for fun and see what we think? >>>>> This way we'd just be impacting one page and not a whole site. >>>>>=20 >>>>=20 >>>> If we want access to review and approve suggestions made by readers >>>> then it needs to be on a domain that we "own". This is in common = with >>>> most Google services, you need to demonstrate that you control the >>>> domain, typically by adding a special META tag to the homepage. = For >>>> *.openoffice.org this is easy, and I've already done this to enable >>>> Google Analytics. If we want to do the same for the blog we'd need >>>> the ability to insert special markup into the and of = the >>>> blog template. I'm not sure whether this is possible with our = Roller >>>> setup. >>>>=20 >>>=20 >>> oh -- well too bad. It could have been fun. >>>=20 >>>=20 >>>>=20 >>>> Another way of testing this, in a quantitative way, is via what is >>>> called "A/B Testing". With this approach we define an action a >>>> satisfied site visitor might take, like downloading AOO 3.4. Then = we >>>> randomly show users either the original home page (or download page = or >>>> any other page we're testing). This is "A", and then we show other >>>> users a different version, B. For example, B could have the >>>> translation enabled. Then we ran this "experiment" for a period of >>>> time, like a week or two, tracking which version of the page has = the >>>> higher success rate with users. >>>>=20 >>>=20 >>> hmmmm...interesting >>>=20 >>> OK, I've looked at the rest of your post here and will think about = this for >>> a bit. >>>=20 >>>=20 >>>>=20 >>>> If the machine translated page leads visitors confuses users, or = makes >>>> them suspect the page, then the download %'s will be lower than the >>>> original page. And if the translated page is helpful then the >>>> download numbers would be higher. >>>>=20 >>>> You could imagine other success indicators. Pretty much anything = that >>>> has a URL can be measured. For example, imagine we add a link, = "This >>>> page solved my problem" to the bottom of every documentation page. >>>> Even though the link would just go to a "thanks" page, we could use >>>> that action to measure the success of translated versus = untranslated >>>> pages. >>>>=20 >>>> Of course, we don't need to do this all at once. But I'd recommend = we >>>> think of ways of quantifying success. The website serves our = users. >>>> How do we know what is working well and what isn't? How can we = design >>>> experiments to test alternative approaches? >>>>=20 >>>>=20 >>>> Possible successes for users might be: >>>>=20 >>>> - downloaded AOO >>>>=20 >>>> - found answer to their question >>>>=20 >>>> - signed up for our announcement list >>>>=20 >>>> - entered their first bug report >>>>=20 >>>> - signed up for one of the project lists >>>>=20 >>>> - make first wiki contribution >>>>=20 >>>> - followed/liked/+1'ed us on one of our social networking sites >>>>=20 >>>> Measure, improve, repeat. Constant improvement and optimization. >>>>=20 >>>> We can debate what will improve the website for the users. Or we = can >>>> test and measure. A/B testing is a new option for us, a technique >>>> that once was used only by the largest commercial websites, but is = now >>>> available to everyone via Google's "content experiments" support in >>>> Google Analytics. >>>>=20 >>>> -Rob >>>>=20 >>>>> I think that might a perfect application for something like this. >>>>>=20 >>>>>=20 >>>>>=20 >>>>> -- >>>>>=20 >>>> = --------------------------------------------------------------------------= -------------- >>>>> MzK >>>>>=20 >>>>> "I would rather have a donkey that takes me there >>>>> than a horse that will not fare." >>>>> -- Portuguese proverb >>>>=20 >>>=20 >>>=20 >>>=20 >>> -- >>> = --------------------------------------------------------------------------= -------------- >>> MzK >>>=20 >>> "I would rather have a donkey that takes me there >>> than a horse that will not fare." >>> -- Portuguese proverb >>=20