Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 91710184A6 for ; Sun, 31 Jan 2016 04:58:30 +0000 (UTC) Received: (qmail 72673 invoked by uid 500); 31 Jan 2016 04:58:29 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 72423 invoked by uid 500); 31 Jan 2016 04:58:29 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 72412 invoked by uid 99); 31 Jan 2016 04:58:29 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 31 Jan 2016 04:58:29 +0000 Received: from mail-io0-f173.google.com (mail-io0-f173.google.com [209.85.223.173]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 011361A003F for ; Sun, 31 Jan 2016 04:58:28 +0000 (UTC) Received: by mail-io0-f173.google.com with SMTP id 9so47506610iom.1 for ; Sat, 30 Jan 2016 20:58:28 -0800 (PST) X-Gm-Message-State: AG10YOTKNN/aPbVE6Tfh+RiSSxm5C/NfSfVxBu12JUtlDk3TyyHsnxxM8D8i5MdVE8OFhy1BxxH7+buauMqNtg== MIME-Version: 1.0 X-Received: by 10.107.161.206 with SMTP id k197mr17887344ioe.139.1454216307751; Sat, 30 Jan 2016 20:58:27 -0800 (PST) Received: by 10.107.32.19 with HTTP; Sat, 30 Jan 2016 20:58:27 -0800 (PST) In-Reply-To: References: Date: Sat, 30 Jan 2016 20:58:27 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling From: Henri Yandell To: general@incubator.apache.org Content-Type: multipart/alternative; boundary=001a1140f9887345ab052a9a1e4d --001a1140f9887345ab052a9a1e4d Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable +1 (non-binding). On Sat, Jan 30, 2016 at 5:45 PM, Luke Han wrote: > +1 non-binding > > > Best Regards! > --------------------- > > Luke Han > > On Sun, Jan 31, 2016 at 5:27 AM, Tom Barber > wrote: > > > +1 binding > > > > Should be a very interesting project! > > > > On Sat, Jan 30, 2016 at 8:05 PM, Ashish wrote= : > > > > > + (non-binding) > > > > > > On Sat, Jan 30, 2016 at 12:00 PM, Mattmann, Chris A (3980) > > > wrote: > > > > Hi Everyone, > > > > > > > > OK the discussion is now completed. Please VOTE to accept Joshua > > > > into the Apache Incubator. I=E2=80=99ll leave the VOTE open for at = least > > > > the next 72 hours, with hopes to close it next Friday the 5th of > > > > February, 2016. > > > > > > > > [ ] +1 Accept Joshua as an Apache Incubator podling. > > > > [ ] +0 Abstain. > > > > [ ] -1 Don=E2=80=99t accept Joshua as an Apache Incubator podling b= ecause.. > > > > > > > > Of course, I am +1 on this. Please note VOTEs from Incubator PMC > > > > members are binding but all are welcome to VOTE! > > > > > > > > Cheers, > > > > Chris > > > > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > Chris Mattmann, Ph.D. > > > > Chief Architect > > > > Instrument Software and Science Data Systems Section (398) > > > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > > > Office: 168-519, Mailstop: 168-527 > > > > Email: chris.a.mattmann@nasa.gov > > > > WWW: http://sunset.usc.edu/~mattmann/ > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > Adjunct Associate Professor, Computer Science Department > > > > University of Southern California, Los Angeles, CA 90089 USA > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > From: jpluser > > > > Date: Tuesday, January 12, 2016 at 10:56 PM > > > > To: "general@incubator.apache.org" > > > > Cc: "post@cs.jhu.edu" > > > > Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine > > Translation > > > > Toolkit > > > > > > > >>Hi Everyone, > > > >> > > > >>Please find attached for your viewing pleasure a proposed new > project, > > > >>Apache Joshua, a statistical machine translation toolkit. The > proposal > > > >>is in wiki draft form at: > > > https://wiki.apache.org/incubator/JoshuaProposal > > > >> > > > >>Proposal text is copied below. I=E2=80=99ll leave the discussion op= en for a > > week > > > >>and we are interested in folks who would like to be initial > committers > > > >>and mentors. Please discuss here on the thread. > > > >> > > > >>Thanks! > > > >> > > > >>Cheers, > > > >>Chris (Champion) > > > >> > > > >>=E2=80=94=E2=80=94=E2=80=94 > > > >> > > > >>=3D Joshua Proposal =3D > > > >> > > > >>=3D=3D Abstract =3D=3D > > > >>[[joshua-decoder.org|Joshua]] is an open-source statistical machine > > > >>translation toolkit. It includes a Java-based decoder for translati= ng > > > with > > > >>phrase-based, hierarchical, and syntax-based translation models, a > > > >>Hadoop-based grammar extractor (Thrax), and an extensive set of too= ls > > and > > > >>scripts for training and evaluating new models from parallel text. > > > >> > > > >>=3D=3D Proposal =3D=3D > > > >>Joshua is a state of the art statistical machine translation system > > that > > > >>provides a number of features: > > > >> > > > >> * Support for the two main paradigms in statistical machine > > translation: > > > >>phrase-based and hierarchical / syntactic. > > > >> * A sparse feature API that makes it easy to add new feature > templates > > > >>supporting millions of features > > > >> * Native implementations of many tuners (MERT, MIRA, PRO, and > AdaGrad) > > > >> * Support for lattice decoding, allowing upstream NLP tools to > expose > > > >>their hypothesis space to the MT system > > > >> * An efficient representation for models, allowing for quick loadi= ng > > of > > > >>multi-gigabyte model files > > > >> * Fast decoding speed (on par with Moses and mtplz) > > > >> * Language packs =E2=80=94 precompiled models that allow the decod= er to be > run > > > as > > > >>a black box > > > >> * Thrax, a Hadoop-based tool for learning translation models from > > > >>parallel text > > > >> * A suite of tools for constructing new models for any language pa= ir > > for > > > >>which sufficient training data exists > > > >> > > > >>=3D=3D Background and Rationale =3D=3D > > > >>A number of factors make this a good time for an Apache project > focused > > > on > > > >>machine translation (MT): the quality of MT output (for many langua= ge > > > >>pairs); the average computing resources available on computers, > > relative > > > >>to the needs of MT systems; and the availability of a number of > > > >>high-quality toolkits, together with a large base of researchers > > working > > > >>on them. > > > >> > > > >>Over the past decade, machine translation (MT; the automatic > > translation > > > >>of one human language to another) has become a reality. The researc= h > > into > > > >>statistical approaches to translation that began in the early > nineties, > > > >>together with the availability of large amounts of training data, a= nd > > > >>better computing infrastructure, have all come together to produce > > > >>translations results that are =E2=80=9Cgood enough=E2=80=9D for a l= arge set of > language > > > >>pairs and use cases. Free services like > > > >>[[https://www.bing.com/translator|Bing Translator]] and > > > >>[[https://translate.google.com|Google Translate]] have made these > > > services > > > >>available to the average person through direct interfaces and throu= gh > > > >>tools like browser plugins, and sites across the world with higher > > > >>translation needs use them to translate their pages through > > > automatically. > > > >> > > > >>MT does not require the infrastructure of large corporations in ord= er > > to > > > >>produce feasible output. Machine translation can be > resource-intensive, > > > >>but need not be prohibitively so. Disk and memory usage are mostly = a > > > >>matter of model size, which for most language pairs is a few > gigabytes > > at > > > >>most, at which size models can provide coverage on the order of ten= s > or > > > >>even hundreds of thousands of words in the input and output > languages. > > > The > > > >>computational complexity of the algorithms used to search for > > > translations > > > >>of new sentences are typically linear in the number of words in the > > input > > > >>sentence, making it possible to run a translation engine on a > personal > > > >>computer. > > > >> > > > >>The research community has produced many different open source > > > translation > > > >>projects for a range of programming languages and under a variety o= f > > > >>licenses. These projects include the core =E2=80=9Cdecoder=E2=80=9D= , which takes a > > model > > > >>and uses it to translate new sentences between the language pair th= e > > > model > > > >>was defined for. They also typically include a large set of tools > that > > > >>enable new models to be built from large sets of example translatio= ns > > > >>(=E2=80=9Cparallel data=E2=80=9D) and monolingual texts. These tool= kits are usually > > built > > > >>to support the agendas of the (largely) academic researchers that > build > > > >>them: the repeated cycle of building new models, tuning model > > parameters > > > >>against development data, and evaluating them against held-out test > > data, > > > >>using standard metrics for testing the quality of MT output. > > > >> > > > >>Together, these three factors=E2=80=94the quality of machine transl= ation > > output, > > > >>the feasibility of translating on standard computers, and the > > > availability > > > >>of tools to build models=E2=80=94make it reasonable for the end use= rs to use > MT > > > as > > > >>a black-box service, and to run it on their personal machine. > > > >> > > > >>These factors make it a good time for an organization with the stat= us > > of > > > >>the Apache Foundation to host a machine translation project. > > > >> > > > >>=3D=3D Current Status =3D=3D > > > >>Joshua was originally ported from David Chiang=E2=80=99s Python > implementation > > of > > > >>Hiero by Zhifei Li, while he was a Ph.D. student at Johns Hopkins > > > >>University. The current version is maintained by Matt Post at Johns > > > >>Hopkins=E2=80=99 Human Language Technology Center of Excellence. Jo= shua has > > made > > > >>many releases with a list of over 20 source code tags. The last > release > > > of > > > >>Joshua was 6.0.5 on November 5th, 2015. > > > >> > > > >>=3D=3D Meritocracy =3D=3D > > > >>The current developers are familiar with meritocratic open source > > > >>development at Apache. Apache was chosen specifically because we wa= nt > > to > > > >>encourage this style of development for the project. > > > >> > > > >>=3D=3D Community =3D=3D > > > >>Joshua is used widely across the world. Perhaps its biggest (known) > > > >>research / industrial user is the Amazon research group in Berlin. > > > Another > > > >>user is the US Army Research Lab. No formal census has been > undertaken, > > > >>but posts to the Joshua technical support mailing list, along with > the > > > >>occasional contributions, suggest small research and academic > > communities > > > >>spread across the world, many of them in India. > > > >> > > > >>During incubation, we will explicitly seek to increase our usage > across > > > >>the board, including academic research, industry, and other end use= rs > > > >>interested in statistical machine translation. > > > >> > > > >>=3D=3D Core Developers =3D=3D > > > >>The current set of core developers is fairly small, having fallen > with > > > the > > > >>graduation from Johns Hopkins of some core student participants. > > However, > > > >>Joshua is used fairly widely, as mentioned above, and there remains= a > > > >>commitment from the principal researcher at Johns Hopkins to contin= ue > > to > > > >>use and develop it. Joshua has seen a number of new community membe= rs > > > >>become interested recently due to a potential for its projected use > in > > a > > > >>number of ongoing DARPA projects such as XDATA and Memex. > > > >> > > > >>=3D=3D Alignment =3D=3D > > > >>Joshua is currently Copyright (c) 2015, Johns Hopkins University Al= l > > > >>rights reserved and licensed under BSD 2-clause license. It would o= f > > > >>course be the intention to relicense this code under AL2.0 which > would > > > >>permit expanded and increased use of the software within Apache > > projects. > > > >>There is currently an ongoing effort within the Apache Tika communi= ty > > to > > > >>utilize Joshua within Tika=E2=80=99s Translate API, see > > > >>[[https://issues.apache.org/jira/browse/TIKA-1343|TIKA-1343]]. > > > >> > > > >>=3D=3D Known Risks =3D=3D > > > >> > > > >>=3D=3D=3D Orphaned products =3D=3D=3D > > > >>At the moment, regular contributions are made by a single > contributor, > > > the > > > >>lead maintainer. He (Matt Post) plans to continue development for t= he > > > next > > > >>few years, but it is still a single point of failure, since the > > graduate > > > >>students who worked on the project have moved on to jobs, mostly in > > > >>industry. However, our goal is to help that process by growing the > > > >>community in Apache, and at least in growing the community with use= rs > > and > > > >>participants from NASA JPL. > > > >> > > > >>=3D=3D=3D Inexperience with Open Source =3D=3D=3D > > > >>The team both at Johns Hopkins and NASA JPL have experience with ma= ny > > OSS > > > >>software projects at Apache and elsewhere. We understand "how it > works" > > > >>here at the foundation. > > > >> > > > >> > > > >>=3D=3D Relationships with Other Apache Products =3D=3D > > > >>Joshua includes dependences on Hadoop, and also is included as a > plugin > > > in > > > >>Apache Tika. We are also interested in coordinating with other > projects > > > >>including Spark, and other projects needing MT services for languag= e > > > >>translation. > > > >> > > > >>=3D=3D Developers =3D=3D > > > >>Joshua only has one regular developer who is employed by Johns > Hopkins > > > >>University. NASA JPL (Mattmann and McGibbney) have been contributin= g > > > >>lately including a Brew formula and other contributions to the > project > > > >>through the DARPA XDATA and Memex programs. > > > >> > > > >>=3D=3D Documentation =3D=3D > > > >>Documentation and publications related to Joshua can be found at > > > >>joshua-decoder.org. The source for the Joshua documentation is > > currently > > > >>hosted on Github at > > > >>https://github.com/joshua-decoder/joshua-decoder.github.com > > > >> > > > >>=3D=3D Initial Source =3D=3D > > > >>Current source resides at Github: github.com/joshua-decoder/joshua > > (the > > > >>main decoder and toolkit) and github.com/joshua-decoder/thrax (the > > > grammar > > > >>extraction tool). > > > >> > > > >>=3D=3D External Dependencies =3D=3D > > > >>Joshua has a number of external dependencies. Only BerkeleyLM (Apac= he > > > 2.0) > > > >>and KenLM (LGPG 2.1) are run-time decoder dependencies (one of whic= h > is > > > >>needed for translating sentences with pre-built models). The rest a= re > > > >>dependencies for the build system and pipeline, used for constructi= ng > > and > > > >>training new models from parallel text. > > > >> > > > >>Apache projects: > > > >> * Ant > > > >> * Hadoop > > > >> * Commons > > > >> * Maven > > > >> * Ivy > > > >> > > > >>There are also a number of other open-source projects with various > > > >>licenses that the project depends on both dynamically (runtime), an= d > > > >>statically. > > > >> > > > >>=3D=3D=3D GNU GPL 2 =3D=3D=3D > > > >> * Berkeley Aligner: https://code.google.com/p/berkeleyaligner/ > > > >> > > > >>=3D=3D=3D LGPG 2.1 =3D=3D=3D > > > >> * KenLM: github.com/kpu/kenlm > > > >> > > > >>=3D=3D=3D Apache 2.0 =3D=3D=3D > > > >> * BerkeleyLM: https://code.google.com/p/berkeleylm/ > > > >> > > > >>=3D=3D=3D GNU GPL =3D=3D=3D > > > >> * GIZA++: http://www.statmt.org/moses/giza/GIZA++.html > > > >> > > > >>=3D=3D Required Resources =3D=3D > > > >> * Mailing Lists > > > >> * private@joshua.incubator.apache.org > > > >> * dev@joshua.incubator.apache.org > > > >> * commits@joshua.incubator.apache.org > > > >> > > > >> * Git Repos > > > >> * https://git-wip-us.apache.org/repos/asf/joshua.git > > > >> > > > >> * Issue Tracking > > > >> * JIRA Joshua (JOSHUA) > > > >> > > > >> * Continuous Integration > > > >> * Jenkins builds on https://builds.apache.org/ > > > >> > > > >> * Web > > > >> * http://joshua.incubator.apache.org/ > > > >> * wiki at http://cwiki.apache.org > > > >> > > > >>=3D=3D Initial Committers =3D=3D > > > >>The following is a list of the planned initial Apache committers (t= he > > > >>active subset of the committers for the current repository on > Github). > > > >> > > > >> * Matt Post (post@cs.jhu.edu) > > > >> * Lewis John McGibbney (lewismc@apache.org) > > > >> * Chris Mattmann (mattmann@apache.org) > > > >> > > > >>=3D=3D Affiliations =3D=3D > > > >> > > > >> * Johns Hopkins University > > > >> * Matt Post > > > >> > > > >> * NASA JPL > > > >> * Chris Mattmann > > > >> * Lewis John McGibbney > > > >> > > > >> > > > >>=3D=3D Sponsors =3D=3D > > > >>=3D=3D=3D Champion =3D=3D=3D > > > >> * Chris Mattmann (NASA/JPL) > > > >> > > > >>=3D=3D=3D Nominated Mentors =3D=3D=3D > > > >> * Paul Ramirez > > > >> * Lewis John McGibbney > > > >> * Chris Mattmann > > > >> > > > >>=3D=3D Sponsoring Entity =3D=3D > > > >>The Apache Incubator > > > >> > > > >> > > > >> > > > >> > > > >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > >>Chris Mattmann, Ph.D. > > > >>Chief Architect > > > >>Instrument Software and Science Data Systems Section (398) > > > >>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > > >>Office: 168-519, Mailstop: 168-527 > > > >>Email: chris.a.mattmann@nasa.gov > > > >>WWW: http://sunset.usc.edu/~mattmann/ > > > >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > >>Adjunct Associate Professor, Computer Science Department > > > >>University of Southern California, Los Angeles, CA 90089 USA > > > >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > >> > > > >> > > > >> > > > > > > > > > > > > > > > > -- > > > thanks > > > ashish > > > > > > Blog: http://www.ashishpaliwal.com/blog > > > My Photo Galleries: http://www.pbase.com/ashishpaliwal > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > > > For additional commands, e-mail: general-help@incubator.apache.org > > > > > > > > > --001a1140f9887345ab052a9a1e4d--