Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E97F817A35 for ; Sat, 30 Jan 2016 20:05:32 +0000 (UTC) Received: (qmail 30811 invoked by uid 500); 30 Jan 2016 20:05:32 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 30602 invoked by uid 500); 30 Jan 2016 20:05:32 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 30590 invoked by uid 99); 30 Jan 2016 20:05:32 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 30 Jan 2016 20:05:32 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 99FF41A000F for ; Sat, 30 Jan 2016 20:05:31 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.1 X-Spam-Level: X-Spam-Status: No, score=-0.1 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id JumEfWTQSHLt for ; Sat, 30 Jan 2016 20:05:16 +0000 (UTC) Received: from mail-ig0-f174.google.com (mail-ig0-f174.google.com [209.85.213.174]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 01BEE258C7 for ; Sat, 30 Jan 2016 20:05:16 +0000 (UTC) Received: by mail-ig0-f174.google.com with SMTP id h5so10182154igh.0 for ; Sat, 30 Jan 2016 12:05:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=o7VwijOR1zkdBwfnbRM1q4Xa7Kb5WkaUv+IEqVwS3fA=; b=PXaCxap+luko8sr/U4UAprxPZGh1ELS20p/0Oe/YG1GXunH2buR2aXUEmaaenh9pDQ t0/H1tNRbKLluNJVk6ray6P/MXagnBDZipbix+3TvoaP6/nVNURkuANqRSth6yNwkV4G N8NYCYay8sMUfblcdTND6xZ/kqu+AQq6k8nOn/o58CxZGfIOSyo5Ct6QeDJ+08W5w2Bv xx9ioCDFDQxu2fP7tB7X7p0hnXQuB2DFtSbWv7Fip0SdBvFyuDXTgtEerCyjXOiiKLYt a61OFFN5niamEl/j1RstrWpLJsAwG2g/ourzpZpgSiFrisV5H6JKCqLPXZBgq+UQoD8T otHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding; bh=o7VwijOR1zkdBwfnbRM1q4Xa7Kb5WkaUv+IEqVwS3fA=; b=TZnIeqeKkxBshyuRUod32STstbi7kuF8sK3Eso6qJiROtNYK24wgzuq+vM9bHogwd0 9l8FID4dHfkbhO97/b45gddC+I5ul7kg5z6j4gmuFJzwJSEQI3DfRLEjH5U5elj5mxMV YBPq7tOJGHp7peQc0pyDY/6OczG1+34czr/XNfyKHooNzIaG8ZU88/kt/IieHpZdJJMJ 8VO8mHpzW2Ha/Dj3bOHfhuIQRp/1ItSZva1lJ8MlUwljvqQWa4kmsqq5tO1O8FwYONxJ bM1UVIOssahSWDf/vRsRsmzj/ZVmWn9gphEaPd6sj7A7oNaKjg2h31NdK6s4ftKpQIZr DlzA== X-Gm-Message-State: AG10YOSmjhTxMsPXJ2JdRJP6Ecv72IWGmLDdWFkqVW+KQiwMxarJesqkDrLgvCshtYQguRLtfIvPtD9roljBdQ== MIME-Version: 1.0 X-Received: by 10.50.25.167 with SMTP id d7mr3650961igg.69.1454184309296; Sat, 30 Jan 2016 12:05:09 -0800 (PST) Received: by 10.107.159.16 with HTTP; Sat, 30 Jan 2016 12:05:09 -0800 (PST) In-Reply-To: References: Date: Sat, 30 Jan 2016 12:05:09 -0800 Message-ID: Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling From: Ashish To: general@incubator.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable + (non-binding) On Sat, Jan 30, 2016 at 12:00 PM, Mattmann, Chris A (3980) wrote: > Hi Everyone, > > OK the discussion is now completed. Please VOTE to accept Joshua > into the Apache Incubator. I=E2=80=99ll leave the VOTE open for at least > the next 72 hours, with hopes to close it next Friday the 5th of > February, 2016. > > [ ] +1 Accept Joshua as an Apache Incubator podling. > [ ] +0 Abstain. > [ ] -1 Don=E2=80=99t accept Joshua as an Apache Incubator podling because= .. > > Of course, I am +1 on this. Please note VOTEs from Incubator PMC > members are binding but all are welcome to VOTE! > > Cheers, > Chris > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: chris.a.mattmann@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > -----Original Message----- > From: jpluser > Date: Tuesday, January 12, 2016 at 10:56 PM > To: "general@incubator.apache.org" > Cc: "post@cs.jhu.edu" > Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine Translation > Toolkit > >>Hi Everyone, >> >>Please find attached for your viewing pleasure a proposed new project, >>Apache Joshua, a statistical machine translation toolkit. The proposal >>is in wiki draft form at: https://wiki.apache.org/incubator/JoshuaProposa= l >> >>Proposal text is copied below. I=E2=80=99ll leave the discussion open for= a week >>and we are interested in folks who would like to be initial committers >>and mentors. Please discuss here on the thread. >> >>Thanks! >> >>Cheers, >>Chris (Champion) >> >>=E2=80=94=E2=80=94=E2=80=94 >> >>=3D Joshua Proposal =3D >> >>=3D=3D Abstract =3D=3D >>[[joshua-decoder.org|Joshua]] is an open-source statistical machine >>translation toolkit. It includes a Java-based decoder for translating wit= h >>phrase-based, hierarchical, and syntax-based translation models, a >>Hadoop-based grammar extractor (Thrax), and an extensive set of tools and >>scripts for training and evaluating new models from parallel text. >> >>=3D=3D Proposal =3D=3D >>Joshua is a state of the art statistical machine translation system that >>provides a number of features: >> >> * Support for the two main paradigms in statistical machine translation: >>phrase-based and hierarchical / syntactic. >> * A sparse feature API that makes it easy to add new feature templates >>supporting millions of features >> * Native implementations of many tuners (MERT, MIRA, PRO, and AdaGrad) >> * Support for lattice decoding, allowing upstream NLP tools to expose >>their hypothesis space to the MT system >> * An efficient representation for models, allowing for quick loading of >>multi-gigabyte model files >> * Fast decoding speed (on par with Moses and mtplz) >> * Language packs =E2=80=94 precompiled models that allow the decoder to = be run as >>a black box >> * Thrax, a Hadoop-based tool for learning translation models from >>parallel text >> * A suite of tools for constructing new models for any language pair for >>which sufficient training data exists >> >>=3D=3D Background and Rationale =3D=3D >>A number of factors make this a good time for an Apache project focused o= n >>machine translation (MT): the quality of MT output (for many language >>pairs); the average computing resources available on computers, relative >>to the needs of MT systems; and the availability of a number of >>high-quality toolkits, together with a large base of researchers working >>on them. >> >>Over the past decade, machine translation (MT; the automatic translation >>of one human language to another) has become a reality. The research into >>statistical approaches to translation that began in the early nineties, >>together with the availability of large amounts of training data, and >>better computing infrastructure, have all come together to produce >>translations results that are =E2=80=9Cgood enough=E2=80=9D for a large s= et of language >>pairs and use cases. Free services like >>[[https://www.bing.com/translator|Bing Translator]] and >>[[https://translate.google.com|Google Translate]] have made these service= s >>available to the average person through direct interfaces and through >>tools like browser plugins, and sites across the world with higher >>translation needs use them to translate their pages through automatically= . >> >>MT does not require the infrastructure of large corporations in order to >>produce feasible output. Machine translation can be resource-intensive, >>but need not be prohibitively so. Disk and memory usage are mostly a >>matter of model size, which for most language pairs is a few gigabytes at >>most, at which size models can provide coverage on the order of tens or >>even hundreds of thousands of words in the input and output languages. Th= e >>computational complexity of the algorithms used to search for translation= s >>of new sentences are typically linear in the number of words in the input >>sentence, making it possible to run a translation engine on a personal >>computer. >> >>The research community has produced many different open source translatio= n >>projects for a range of programming languages and under a variety of >>licenses. These projects include the core =E2=80=9Cdecoder=E2=80=9D, whic= h takes a model >>and uses it to translate new sentences between the language pair the mode= l >>was defined for. They also typically include a large set of tools that >>enable new models to be built from large sets of example translations >>(=E2=80=9Cparallel data=E2=80=9D) and monolingual texts. These toolkits a= re usually built >>to support the agendas of the (largely) academic researchers that build >>them: the repeated cycle of building new models, tuning model parameters >>against development data, and evaluating them against held-out test data, >>using standard metrics for testing the quality of MT output. >> >>Together, these three factors=E2=80=94the quality of machine translation = output, >>the feasibility of translating on standard computers, and the availabilit= y >>of tools to build models=E2=80=94make it reasonable for the end users to = use MT as >>a black-box service, and to run it on their personal machine. >> >>These factors make it a good time for an organization with the status of >>the Apache Foundation to host a machine translation project. >> >>=3D=3D Current Status =3D=3D >>Joshua was originally ported from David Chiang=E2=80=99s Python implement= ation of >>Hiero by Zhifei Li, while he was a Ph.D. student at Johns Hopkins >>University. The current version is maintained by Matt Post at Johns >>Hopkins=E2=80=99 Human Language Technology Center of Excellence. Joshua h= as made >>many releases with a list of over 20 source code tags. The last release o= f >>Joshua was 6.0.5 on November 5th, 2015. >> >>=3D=3D Meritocracy =3D=3D >>The current developers are familiar with meritocratic open source >>development at Apache. Apache was chosen specifically because we want to >>encourage this style of development for the project. >> >>=3D=3D Community =3D=3D >>Joshua is used widely across the world. Perhaps its biggest (known) >>research / industrial user is the Amazon research group in Berlin. Anothe= r >>user is the US Army Research Lab. No formal census has been undertaken, >>but posts to the Joshua technical support mailing list, along with the >>occasional contributions, suggest small research and academic communities >>spread across the world, many of them in India. >> >>During incubation, we will explicitly seek to increase our usage across >>the board, including academic research, industry, and other end users >>interested in statistical machine translation. >> >>=3D=3D Core Developers =3D=3D >>The current set of core developers is fairly small, having fallen with th= e >>graduation from Johns Hopkins of some core student participants. However, >>Joshua is used fairly widely, as mentioned above, and there remains a >>commitment from the principal researcher at Johns Hopkins to continue to >>use and develop it. Joshua has seen a number of new community members >>become interested recently due to a potential for its projected use in a >>number of ongoing DARPA projects such as XDATA and Memex. >> >>=3D=3D Alignment =3D=3D >>Joshua is currently Copyright (c) 2015, Johns Hopkins University All >>rights reserved and licensed under BSD 2-clause license. It would of >>course be the intention to relicense this code under AL2.0 which would >>permit expanded and increased use of the software within Apache projects. >>There is currently an ongoing effort within the Apache Tika community to >>utilize Joshua within Tika=E2=80=99s Translate API, see >>[[https://issues.apache.org/jira/browse/TIKA-1343|TIKA-1343]]. >> >>=3D=3D Known Risks =3D=3D >> >>=3D=3D=3D Orphaned products =3D=3D=3D >>At the moment, regular contributions are made by a single contributor, th= e >>lead maintainer. He (Matt Post) plans to continue development for the nex= t >>few years, but it is still a single point of failure, since the graduate >>students who worked on the project have moved on to jobs, mostly in >>industry. However, our goal is to help that process by growing the >>community in Apache, and at least in growing the community with users and >>participants from NASA JPL. >> >>=3D=3D=3D Inexperience with Open Source =3D=3D=3D >>The team both at Johns Hopkins and NASA JPL have experience with many OSS >>software projects at Apache and elsewhere. We understand "how it works" >>here at the foundation. >> >> >>=3D=3D Relationships with Other Apache Products =3D=3D >>Joshua includes dependences on Hadoop, and also is included as a plugin i= n >>Apache Tika. We are also interested in coordinating with other projects >>including Spark, and other projects needing MT services for language >>translation. >> >>=3D=3D Developers =3D=3D >>Joshua only has one regular developer who is employed by Johns Hopkins >>University. NASA JPL (Mattmann and McGibbney) have been contributing >>lately including a Brew formula and other contributions to the project >>through the DARPA XDATA and Memex programs. >> >>=3D=3D Documentation =3D=3D >>Documentation and publications related to Joshua can be found at >>joshua-decoder.org. The source for the Joshua documentation is currently >>hosted on Github at >>https://github.com/joshua-decoder/joshua-decoder.github.com >> >>=3D=3D Initial Source =3D=3D >>Current source resides at Github: github.com/joshua-decoder/joshua (the >>main decoder and toolkit) and github.com/joshua-decoder/thrax (the gramma= r >>extraction tool). >> >>=3D=3D External Dependencies =3D=3D >>Joshua has a number of external dependencies. Only BerkeleyLM (Apache 2.0= ) >>and KenLM (LGPG 2.1) are run-time decoder dependencies (one of which is >>needed for translating sentences with pre-built models). The rest are >>dependencies for the build system and pipeline, used for constructing and >>training new models from parallel text. >> >>Apache projects: >> * Ant >> * Hadoop >> * Commons >> * Maven >> * Ivy >> >>There are also a number of other open-source projects with various >>licenses that the project depends on both dynamically (runtime), and >>statically. >> >>=3D=3D=3D GNU GPL 2 =3D=3D=3D >> * Berkeley Aligner: https://code.google.com/p/berkeleyaligner/ >> >>=3D=3D=3D LGPG 2.1 =3D=3D=3D >> * KenLM: github.com/kpu/kenlm >> >>=3D=3D=3D Apache 2.0 =3D=3D=3D >> * BerkeleyLM: https://code.google.com/p/berkeleylm/ >> >>=3D=3D=3D GNU GPL =3D=3D=3D >> * GIZA++: http://www.statmt.org/moses/giza/GIZA++.html >> >>=3D=3D Required Resources =3D=3D >> * Mailing Lists >> * private@joshua.incubator.apache.org >> * dev@joshua.incubator.apache.org >> * commits@joshua.incubator.apache.org >> >> * Git Repos >> * https://git-wip-us.apache.org/repos/asf/joshua.git >> >> * Issue Tracking >> * JIRA Joshua (JOSHUA) >> >> * Continuous Integration >> * Jenkins builds on https://builds.apache.org/ >> >> * Web >> * http://joshua.incubator.apache.org/ >> * wiki at http://cwiki.apache.org >> >>=3D=3D Initial Committers =3D=3D >>The following is a list of the planned initial Apache committers (the >>active subset of the committers for the current repository on Github). >> >> * Matt Post (post@cs.jhu.edu) >> * Lewis John McGibbney (lewismc@apache.org) >> * Chris Mattmann (mattmann@apache.org) >> >>=3D=3D Affiliations =3D=3D >> >> * Johns Hopkins University >> * Matt Post >> >> * NASA JPL >> * Chris Mattmann >> * Lewis John McGibbney >> >> >>=3D=3D Sponsors =3D=3D >>=3D=3D=3D Champion =3D=3D=3D >> * Chris Mattmann (NASA/JPL) >> >>=3D=3D=3D Nominated Mentors =3D=3D=3D >> * Paul Ramirez >> * Lewis John McGibbney >> * Chris Mattmann >> >>=3D=3D Sponsoring Entity =3D=3D >>The Apache Incubator >> >> >> >> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>Chris Mattmann, Ph.D. >>Chief Architect >>Instrument Software and Science Data Systems Section (398) >>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>Office: 168-519, Mailstop: 168-527 >>Email: chris.a.mattmann@nasa.gov >>WWW: http://sunset.usc.edu/~mattmann/ >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>Adjunct Associate Professor, Computer Science Department >>University of Southern California, Los Angeles, CA 90089 USA >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> > --=20 thanks ashish Blog: http://www.ashishpaliwal.com/blog My Photo Galleries: http://www.pbase.com/ashishpaliwal --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org