incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (3980)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: [DISCUSS] Apache Joshua Incubator Proposal - Machine Translation Toolkit
Date Wed, 20 Jan 2016 18:37:45 GMT
All of this can be worked during Incubation and I think we have
the right folks here who can help to get it set up.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: Henry Saputra <henry.saputra@gmail.com>
Reply-To: "general@incubator.apache.org" <general@incubator.apache.org>
Date: Wednesday, January 20, 2016 at 10:33 AM
To: "general@incubator.apache.org" <general@incubator.apache.org>
Subject: Re: [DISCUSS] Apache Joshua Incubator Proposal - Machine
Translation Toolkit

>This is a bit tricky and I suppose we could leave behind the GPL/LGPL
>dependencies that are used for model building when generating releases
>under ASF license.
>I hope that will work.
>
>- Henry
>
>On Wed, Jan 20, 2016 at 7:51 AM, Matt Post <post@cs.jhu.edu> wrote:
>
>> I imagine so. Model building is very technical and resource intensive
>>and
>> something only a few people will want or need to do. Working on and
>>running
>> the decoder (#2) should be the much more common use case, and with the
>> (included, Apache-licensed) Berkeley LM, that can be done without the
>>need
>> for any external dependencies.
>>
>>
>> > On Jan 20, 2016, at 10:46 AM, Alex Harui <aharui@adobe.com> wrote:
>> >
>> > External is good news.  I'm not sure how much leeway there is in the
>> > following quote from [1], but what percentage of your users are
>>currently
>> > using an all-ASF-compatible set of projects?
>> >
>> >    The question to ask yourself in this situation is:
>> >        * "Will the majority of users want to use my
>> >           product without adding the optional components?"
>> >
>> > -Alex
>> >
>> > [1] http://www.apache.org/legal/resolved.html
>> >
>> >
>> > On 1/20/16, 7:17 AM, "Matt Post" <post@cs.jhu.edu> wrote:
>> >
>> >> The dependencies can be split into two kinds: ones required for
>>building
>> >> new models, and ones needed by the decoder to translate new sentences
>> >> with a pre-built model (i.e., black-box translation with the language
>> >> packs).
>> >>
>> >> 1. For building new models, you need a way to align the words between
>> >> sentences in parallel text. Both the aligners used by Joshua (GIZA++
>>and
>> >> the Berkeley aligner) are GPL of some form. These can be implemented
>>as
>> >> external dependencies, or can be replaced with another aligner, like
>> >> fast_align (https://github.com/clab/fast_align), which is
>> >> Apache-licensed. There are many other options, in fact. So this
>>should
>> >> not be a worry.
>> >>
>> >> 2. For doing black-box translation, one needs to represent the
>>language
>> >> model, which is very large. The best tool for this is KenLM
>> >> (github.com/kpu/kenlm), which is LGPL 2.1. There is also BerkeleyLM,
>> >> which is just as good for practical purposes and is Apache-licensed.
>> >> KenLM is C++ and is loaded via the JNI, whereas BerkeleyLM is
>>written in
>> >> Java. I have moved to including BerkeleyLM in language packs,
>>because I
>> >> can then include the Joshua-runtime, and people can translate without
>> >> even having to compile anything.
>> >>
>> >> So in short, there are no hard dependencies on unfavorably-licensed
>> >> external projects.
>> >>
>> >> matt
>> >>
>> >>
>> >>
>> >>
>> >>> On Jan 20, 2016, at 10:08 AM, Mattmann, Chris A (3980)
>> >>> <chris.a.mattmann@jpl.nasa.gov> wrote:
>> >>>
>> >>> Hey Hen,
>> >>>
>> >>> Matt Post who I believe is monitoring this list and who has
>> >>> been one of the key Joshua developers and I have discussed this
>> >>> and we believe that potentially GPL/LGPL dependencies can:
>> >>>
>> >>> 1. be replaced with category-A or category-B alternatives. Matt
>> >>> mentioned one already to me which has slipped my mind.
>> >>> 2. be made in such a way that they are external tools and the
>> >>> bindings exist in Joshua to call those external tools (aka runtime
>> >>> deps akin to depending on a C compiler, etc.)
>> >>>
>> >>> Cheers,
>> >>> Chris
>> >>>
>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>> Chris Mattmann, Ph.D.
>> >>> Chief Architect
>> >>> Instrument Software and Science Data Systems Section (398)
>> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >>> Office: 168-519, Mailstop: 168-527
>> >>> Email: chris.a.mattmann@nasa.gov
>> >>> WWW:  http://sunset.usc.edu/~mattmann/
>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>> Adjunct Associate Professor, Computer Science Department
>> >>> University of Southern California, Los Angeles, CA 90089 USA
>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org
Mime
View raw message