incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adunuthula, Seshu" <sadunuth...@ebay.com>
Subject Re: [VOTE] Accept Joshua as an Apache Incubator Podling
Date Fri, 12 Feb 2016 20:51:25 GMT
Is there a fail grade? ;)


On 2/12/16, 11:57 AM, "Tom Barber" <tom.barber@meteorite.bi> wrote:

>You're making the presumption its passed its vote! ;)
>
>On Fri, Feb 12, 2016 at 7:33 PM, Mattmann, Chris A (3980) <
>chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> Yep, will send a result shortly.
>>
>> Lewis, after that, can you help me get the podling bootstrap tasks
>> started?
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattmann@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Lewis John Mcgibbney <lewis.mcgibbney@gmail.com>
>> Reply-To: "general@incubator.apache.org" <general@incubator.apache.org>
>> Date: Friday, February 12, 2016 at 11:31 AM
>> To: "general@incubator.apache.org" <general@incubator.apache.org>
>> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling
>>
>> >Hi Chris,
>> >Is it time to close out this VOTE and bring Joshua on board?
>> >Lewis
>> >
>> >On Wed, Feb 3, 2016 at 4:01 PM,
>><general-digest-help@incubator.apache.org
>> >
>> >wrote:
>> >
>> >>
>> >> From: Danese Cooper <danese@gmail.com>
>> >> To: "general@incubator.apache.org" <general@incubator.apache.org>
>> >> Cc: "post@cs.jhu.edu" <post@cs.jhu.edu>
>> >> Date: Wed, 3 Feb 2016 07:43:11 -0800
>> >> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling
>> >> +1 (binding) Accept Joshua as an Apache Incubator podling.
>> >>
>> >> D
>> >>
>> >> > On Jan 30, 2016, at 12:00 PM, Mattmann, Chris A (3980) <
>> >> chris.a.mattmann@jpl.nasa.gov> wrote:
>> >> >
>> >> > Hi Everyone,
>> >> >
>> >> > OK the discussion is now completed. Please VOTE to accept Joshua
>> >> > into the Apache Incubator. I’ll leave the VOTE open for at least
>> >> > the next 72 hours, with hopes to close it next Friday the 5th of
>> >> > February, 2016.
>> >> >
>> >> > [ ] +1 Accept Joshua as an Apache Incubator podling.
>> >> > [ ] +0 Abstain.
>> >> > [ ] -1 Don’t accept Joshua as an Apache Incubator podling
>>because..
>> >> >
>> >> > Of course, I am +1 on this. Please note VOTEs from Incubator PMC
>> >> > members are binding but all are welcome to VOTE!
>> >> >
>> >> > Cheers,
>> >> > Chris
>> >> >
>> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> > Chris Mattmann, Ph.D.
>> >> > Chief Architect
>> >> > Instrument Software and Science Data Systems Section (398)
>> >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >> > Office: 168-519, Mailstop: 168-527
>> >> > Email: chris.a.mattmann@nasa.gov
>> >> > WWW:  http://sunset.usc.edu/~mattmann/
>> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> > Adjunct Associate Professor, Computer Science Department
>> >> > University of Southern California, Los Angeles, CA 90089 USA
>> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > -----Original Message-----
>> >> > From: jpluser <chris.a.mattmann@jpl.nasa.gov>
>> >> > Date: Tuesday, January 12, 2016 at 10:56 PM
>> >> > To: "general@incubator.apache.org" <general@incubator.apache.org>
>> >> > Cc: "post@cs.jhu.edu" <post@cs.jhu.edu>
>> >> > Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine
>> >>Translation
>> >> > Toolkit
>> >> >
>> >> >> Hi Everyone,
>> >> >>
>> >> >> Please find attached for your viewing pleasure a proposed new
>> >>project,
>> >> >> Apache Joshua, a statistical machine translation toolkit. The
>> >>proposal
>> >> >> is in wiki draft form at:
>> >> https://wiki.apache.org/incubator/JoshuaProposal
>> >> >>
>> >> >> Proposal text is copied below. I’ll leave the discussion
open
>>for a
>> >> week
>> >> >> and we are interested in folks who would like to be initial
>> >>committers
>> >> >> and mentors. Please discuss here on the thread.
>> >> >>
>> >> >> Thanks!
>> >> >>
>> >> >> Cheers,
>> >> >> Chris (Champion)
>> >> >>
>> >> >> ———
>> >> >>
>> >> >> = Joshua Proposal =
>> >> >>
>> >> >> == Abstract ==
>> >> >> [[joshua-decoder.org|Joshua]] is an open-source statistical
>>machine
>> >> >> translation toolkit. It includes a Java-based decoder for
>>translating
>> >> with
>> >> >> phrase-based, hierarchical, and syntax-based translation models,
a
>> >> >> Hadoop-based grammar extractor (Thrax), and an extensive set of
>>tools
>> >> and
>> >> >> scripts for training and evaluating new models from parallel text.
>> >> >>
>> >> >> == Proposal ==
>> >> >> Joshua is a state of the art statistical machine translation
>>system
>> >>that
>> >> >> provides a number of features:
>> >> >>
>> >> >> * Support for the two main paradigms in statistical machine
>> >>translation:
>> >> >> phrase-based and hierarchical / syntactic.
>> >> >> * A sparse feature API that makes it easy to add new feature
>> >>templates
>> >> >> supporting millions of features
>> >> >> * Native implementations of many tuners (MERT, MIRA, PRO, and
>> >>AdaGrad)
>> >> >> * Support for lattice decoding, allowing upstream NLP tools to
>>expose
>> >> >> their hypothesis space to the MT system
>> >> >> * An efficient representation for models, allowing for quick
>>loading
>> >>of
>> >> >> multi-gigabyte model files
>> >> >> * Fast decoding speed (on par with Moses and mtplz)
>> >> >> * Language packs — precompiled models that allow the decoder
to
>>be
>> >> run as
>> >> >> a black box
>> >> >> * Thrax, a Hadoop-based tool for learning translation models from
>> >> >> parallel text
>> >> >> * A suite of tools for constructing new models for any language
>>pair
>> >>for
>> >> >> which sufficient training data exists
>> >> >>
>> >> >> == Background and Rationale ==
>> >> >> A number of factors make this a good time for an Apache project
>> >>focused
>> >> on
>> >> >> machine translation (MT): the quality of MT output (for many
>>language
>> >> >> pairs); the average computing resources available on computers,
>> >>relative
>> >> >> to the needs of MT systems; and the availability of a number of
>> >> >> high-quality toolkits, together with a large base of researchers
>> >>working
>> >> >> on them.
>> >> >>
>> >> >> Over the past decade, machine translation (MT; the automatic
>> >>translation
>> >> >> of one human language to another) has become a reality. The
>>research
>> >> into
>> >> >> statistical approaches to translation that began in the early
>> >>nineties,
>> >> >> together with the availability of large amounts of training data,
>>and
>> >> >> better computing infrastructure, have all come together to produce
>> >> >> translations results that are “good enough† for a large
set of
>> >> language
>> >> >> pairs and use cases. Free services like
>> >> >> [[https://www.bing.com/translator|Bing Translator]] and
>> >> >> [[https://translate.google.com|Google Translate]] have made these
>> >> services
>> >> >> available to the average person through direct interfaces and
>>through
>> >> >> tools like browser plugins, and sites across the world with higher
>> >> >> translation needs use them to translate their pages through
>> >> automatically.
>> >> >>
>> >> >> MT does not require the infrastructure of large corporations in
>> >>order to
>> >> >> produce feasible output. Machine translation can be
>> >>resource-intensive,
>> >> >> but need not be prohibitively so. Disk and memory usage are
>>mostly a
>> >> >> matter of model size, which for most language pairs is a few
>> >>gigabytes
>> >> at
>> >> >> most, at which size models can provide coverage on the order of
>>tens
>> >>or
>> >> >> even hundreds of thousands of words in the input and output
>> >>languages.
>> >> The
>> >> >> computational complexity of the algorithms used to search for
>> >> translations
>> >> >> of new sentences are typically linear in the number of words in
>>the
>> >> input
>> >> >> sentence, making it possible to run a translation engine on a
>> >>personal
>> >> >> computer.
>> >> >>
>> >> >> The research community has produced many different open source
>> >> translation
>> >> >> projects for a range of programming languages and under a variety
>>of
>> >> >> licenses. These projects include the core “decoder†,
which
>>takes
>> >>a
>> >> model
>> >> >> and uses it to translate new sentences between the language pair
>>the
>> >> model
>> >> >> was defined for. They also typically include a large set of tools
>> >>that
>> >> >> enable new models to be built from large sets of example
>>translations
>> >> >> (“parallel data†) and monolingual texts. These toolkits
are
>> >>usually
>> >> built
>> >> >> to support the agendas of the (largely) academic researchers that
>> >>build
>> >> >> them: the repeated cycle of building new models, tuning model
>> >>parameters
>> >> >> against development data, and evaluating them against held-out
>>test
>> >> data,
>> >> >> using standard metrics for testing the quality of MT output.
>> >> >>
>> >> >> Together, these three factors—the quality of machine translation
>> >> output,
>> >> >> the feasibility of translating on standard computers, and the
>> >> availability
>> >> >> of tools to build models—make it reasonable for the end
users to
>> >>use
>> >> MT as
>> >> >> a black-box service, and to run it on their personal machine.
>> >> >>
>> >> >> These factors make it a good time for an organization with the
>> >>status of
>> >> >> the Apache Foundation to host a machine translation project.
>> >> >>
>> >> >> == Current Status ==
>> >> >> Joshua was originally ported from David Chiang’s Python
>> >> implementation of
>> >> >> Hiero by Zhifei Li, while he was a Ph.D. student at Johns Hopkins
>> >> >> University. The current version is maintained by Matt Post at
>>Johns
>> >> >> Hopkins’ Human Language Technology Center of Excellence.
Joshua
>>has
>> >> made
>> >> >> many releases with a list of over 20 source code tags. The last
>> >>release
>> >> of
>> >> >> Joshua was 6.0.5 on November 5th, 2015.
>> >> >>
>> >> >> == Meritocracy ==
>> >> >> The current developers are familiar with meritocratic open source
>> >> >> development at Apache. Apache was chosen specifically because we
>> >>want to
>> >> >> encourage this style of development for the project.
>> >> >>
>> >> >> == Community ==
>> >> >> Joshua is used widely across the world. Perhaps its biggest
>>(known)
>> >> >> research / industrial user is the Amazon research group in Berlin.
>> >> Another
>> >> >> user is the US Army Research Lab. No formal census has been
>> >>undertaken,
>> >> >> but posts to the Joshua technical support mailing list, along with
>> >>the
>> >> >> occasional contributions, suggest small research and academic
>> >> communities
>> >> >> spread across the world, many of them in India.
>> >> >>
>> >> >> During incubation, we will explicitly seek to increase our usage
>> >>across
>> >> >> the board, including academic research, industry, and other end
>>users
>> >> >> interested in statistical machine translation.
>> >> >>
>> >> >> == Core Developers ==
>> >> >> The current set of core developers is fairly small, having fallen
>> >>with
>> >> the
>> >> >> graduation from Johns Hopkins of some core student participants.
>> >> However,
>> >> >> Joshua is used fairly widely, as mentioned above, and there
>>remains a
>> >> >> commitment from the principal researcher at Johns Hopkins to
>> >>continue to
>> >> >> use and develop it. Joshua has seen a number of new community
>>members
>> >> >> become interested recently due to a potential for its projected
>>use
>> >>in a
>> >> >> number of ongoing DARPA projects such as XDATA and Memex.
>> >> >>
>> >> >> == Alignment ==
>> >> >> Joshua is currently Copyright (c) 2015, Johns Hopkins University
>>All
>> >> >> rights reserved and licensed under BSD 2-clause license. It would
>>of
>> >> >> course be the intention to relicense this code under AL2.0 which
>> >>would
>> >> >> permit expanded and increased use of the software within Apache
>> >> projects.
>> >> >> There is currently an ongoing effort within the Apache Tika
>> >>community to
>> >> >> utilize Joshua within Tika’s Translate API, see
>> >> >> [[https://issues.apache.org/jira/browse/TIKA-1343|TIKA-1343]].
>> >> >>
>> >> >> == Known Risks ==
>> >> >>
>> >> >> === Orphaned products ===
>> >> >> At the moment, regular contributions are made by a single
>> >>contributor,
>> >> the
>> >> >> lead maintainer. He (Matt Post) plans to continue development for
>>the
>> >> next
>> >> >> few years, but it is still a single point of failure, since the
>> >>graduate
>> >> >> students who worked on the project have moved on to jobs, mostly
>>in
>> >> >> industry. However, our goal is to help that process by growing
the
>> >> >> community in Apache, and at least in growing the community with
>>users
>> >> and
>> >> >> participants from NASA JPL.
>> >> >>
>> >> >> === Inexperience with Open Source ===
>> >> >> The team both at Johns Hopkins and NASA JPL have experience with
>>many
>> >> OSS
>> >> >> software projects at Apache and elsewhere. We understand "how it
>> >>works"
>> >> >> here at the foundation.
>> >> >>
>> >> >>
>> >> >> == Relationships with Other Apache Products ==
>> >> >> Joshua includes dependences on Hadoop, and also is included as
a
>> >>plugin
>> >> in
>> >> >> Apache Tika. We are also interested in coordinating with other
>> >>projects
>> >> >> including Spark, and other projects needing MT services for
>>language
>> >> >> translation.
>> >> >>
>> >> >> == Developers ==
>> >> >> Joshua only has one regular developer who is employed by Johns
>> >>Hopkins
>> >> >> University. NASA JPL (Mattmann and McGibbney) have been
>>contributing
>> >> >> lately including a Brew formula and other contributions to the
>> >>project
>> >> >> through the DARPA XDATA and Memex programs.
>> >> >>
>> >> >> == Documentation ==
>> >> >> Documentation and publications related to Joshua can be found at
>> >> >> joshua-decoder.org. The source for the Joshua documentation is
>> >> currently
>> >> >> hosted on Github at
>> >> >> https://github.com/joshua-decoder/joshua-decoder.github.com
>> >> >>
>> >> >> == Initial Source ==
>> >> >> Current source resides at Github: github.com/joshua-decoder/joshua
>> >>(the
>> >> >> main decoder and toolkit) and github.com/joshua-decoder/thrax (the
>> >> grammar
>> >> >> extraction tool).
>> >> >>
>> >> >> == External Dependencies ==
>> >> >> Joshua has a number of external dependencies. Only BerkeleyLM
>>(Apache
>> >> 2.0)
>> >> >> and KenLM (LGPG 2.1) are run-time decoder dependencies (one of
>>which
>> >>is
>> >> >> needed for translating sentences with pre-built models). The rest
>>are
>> >> >> dependencies for the build system and pipeline, used for
>>constructing
>> >> and
>> >> >> training new models from parallel text.
>> >> >>
>> >> >> Apache projects:
>> >> >> * Ant
>> >> >> * Hadoop
>> >> >> * Commons
>> >> >> * Maven
>> >> >> * Ivy
>> >> >>
>> >> >> There are also a number of other open-source projects with various
>> >> >> licenses that the project depends on both dynamically (runtime),
>>and
>> >> >> statically.
>> >> >>
>> >> >> === GNU GPL 2 ===
>> >> >> * Berkeley Aligner: https://code.google.com/p/berkeleyaligner/
>> >> >>
>> >> >> === LGPG 2.1 ===
>> >> >> * KenLM: github.com/kpu/kenlm
>> >> >>
>> >> >> === Apache 2.0 ===
>> >> >> * BerkeleyLM: https://code.google.com/p/berkeleylm/
>> >> >>
>> >> >> === GNU GPL ===
>> >> >> * GIZA++: http://www.statmt.org/moses/giza/GIZA++.html
>> >> >>
>> >> >> == Required Resources ==
>> >> >> * Mailing Lists
>> >> >>  * private@joshua.incubator.apache.org
>> >> >>  * dev@joshua.incubator.apache.org
>> >> >>  * commits@joshua.incubator.apache.org
>> >> >>
>> >> >> * Git Repos
>> >> >>  * https://git-wip-us.apache.org/repos/asf/joshua.git
>> >> >>
>> >> >> * Issue Tracking
>> >> >>  * JIRA Joshua (JOSHUA)
>> >> >>
>> >> >> * Continuous Integration
>> >> >>  * Jenkins builds on https://builds.apache.org/
>> >> >>
>> >> >> * Web
>> >> >>  * http://joshua.incubator.apache.org/
>> >> >>  * wiki at http://cwiki.apache.org
>> >> >>
>> >> >> == Initial Committers ==
>> >> >> The following is a list of the planned initial Apache committers
>>(the
>> >> >> active subset of the committers for the current repository on
>> >>Github).
>> >> >>
>> >> >> * Matt Post (post@cs.jhu.edu)
>> >> >> * Lewis John McGibbney (lewismc@apache.org)
>> >> >> * Chris Mattmann (mattmann@apache.org)
>> >> >>
>> >> >> == Affiliations ==
>> >> >>
>> >> >> * Johns Hopkins University
>> >> >>  * Matt Post
>> >> >>
>> >> >> * NASA JPL
>> >> >>  * Chris Mattmann
>> >> >>  * Lewis John McGibbney
>> >> >>
>> >> >>
>> >> >> == Sponsors ==
>> >> >> === Champion ===
>> >> >> * Chris Mattmann (NASA/JPL)
>> >> >>
>> >> >> === Nominated Mentors ===
>> >> >> * Paul Ramirez
>> >> >> * Lewis John McGibbney
>> >> >> * Chris Mattmann
>> >> >>
>> >> >> == Sponsoring Entity ==
>> >> >> The Apache Incubator
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >> Chris Mattmann, Ph.D.
>> >> >> Chief Architect
>> >> >> Instrument Software and Science Data Systems Section (398)
>> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >> >> Office: 168-519, Mailstop: 168-527
>> >> >> Email: chris.a.mattmann@nasa.gov
>> >> >> WWW:  http://sunset.usc.edu/~mattmann/
>> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >> Adjunct Associate Professor, Computer Science Department
>> >> >> University of Southern California, Los Angeles, CA 90089 USA
>> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>

Mime
View raw message