cordova-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Apache Cordova "Translated using Microsoft Bing Translation"...
Date Fri, 30 May 2014 19:06:23 GMT
On Fri, May 30, 2014 at 1:29 AM, Josh Soref <> wrote:

> Ted Dunning wrote:
> > Also, if you choose to switch to a different translator at some point, it
> > is likely that they will use the previous translations as the base for a
> > translation memory even if humans are doing the translation.  That counts
> > as the project using the text to train a translation engine.
> I don't think that counts.
> ...

> If "my translation app" takes your X->Y and uses it to apply to the next
> application it sees, then it's opening itself up to some really bad
> poisoning models. Because there's a lot of garbage that will be uploaded
> into translation engines. I'd be shocked if anyone actually did this.

Google translate does this.  They detect parallel text on the web and build
language and translation models using techniques that go back, more or
less, to the Candide work at IBM.  The really big addition that Google made
is that they can and do detect parallel text that is not explicitly marked
as parallel.

This means that if somebody translated the Cordova stuff later using Google
translate, it would likely include this earlier Bing content.

> And yes, I do maintain translation tools. My tools certainly wouldn't do
> this. I maintain translation tools because I've seen the quality of
> translations, and they're awful.

I trust you about your tools.  That doesn't imply others avoid doing this.

> The goal of such a restriction is to prevent someone from using this
> output as a basis for making another generic translation tool.
> If someone takes a document from Spanish, and uses Bing to translate it
> into French, Microsoft is not going to complain if someone later takes
> that (French) document and translates it into Italian, no matter who does
> the translation.
> They're only concerned if someone takes the mapping between Spanish words
> and French words and uses it on an unrelated corpus / to improve their
> handling of unrelated corpora.

Do you say this from actual knowledge of Microsoft's intent?  Or are you
depending on what you read here?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message