opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Schmitz <sch...@cs.washington.edu>
Subject Re: Host stock models in maven central
Date Wed, 08 Aug 2012 04:16:32 GMT
Hi, here are some models trained on Wikipedia data.  They have similar
performance.  Is this useful?

https://gist.github.com/3291931

Peace.  Michael


On Fri, Jun 29, 2012 at 7:43 PM, Michael Schmitz
<schmmd@cs.washington.edu>wrote:

> Well, if I find time, I'll run the models on an Apache-license dataset
> and then train new models using the output.  I'm sure this would be
> safe from licensing issues and if we had any time, we could clean up
> the annotations.
>
> Peace.  Michael
>
>
> On Sun, Jun 24, 2012 at 6:52 PM, Benson Margulies <bimargulies@gmail.com>
> wrote:
> > On Sun, Jun 24, 2012 at 9:48 PM, James Kosin <james.kosin@gmail.com>
> wrote:
> >> Hi Michael,
> >>
> >> Sorry about the late response to this.
> >>
> >> Yes, it is however they also restrict the distribution of the models as
> >> well... I've already asked.  The license allows us to use for research
> >> purposes only and we are not allowed to redistribute the models.  I've
> >> already asked this to the person in charge of distributing the corpus.
> >>
> >> None of OpenNLP's models are based on this corpus as far as I know.  All
> >> the models are produced from different copyrights and limitations.
> >> Apache license however, doesn't allow for binary only distribution with
> >> no way of producing or reproducing from our own sources that must be
> >> licensed under the Apache license.  The best way we can do right now is
> >> to distribute the sources and binaries for the java classes and work on
> >> producing a corpus of our own from non-copyrighted text and distributed
> >> those sources and models in Apache under the licensing from Apache.
> >
> > Also note that nothing stops someone else from distributing binary
> > models outside of Apache. Anyone who wanted to pick up the corpora and
> > reach their own conclusion about the legitimacy of open distribution
> > of binary models could build these models and distribute them via
> > OSSRH to maven central. Just so long as they respect ASF trademark
> > policies in describing the models as, oh, 'useful with the Apache
> > OpenNLP software library'.
> >
> >
> >
> >>
> >> James
> >>
> >> On 6/12/2012 12:37 PM, Michael Schmitz wrote:
> >>> Hi James, is this the contract?
> >>>
> >>> http://trec.nist.gov/data/reuters/org_appl_reuters_v4.html
> >>>
> >>> If so, I think you are free to license your derived models however you
> >>> please although you may not redistribute the training data.
> >>>
> >>> What models does the Reuters contract apply to?
> >>>
> >>> Peace.  Michael
> >>>
> >>>
> >>> On Mon, Jun 11, 2012 at 7:23 PM, James Kosin <james.kosin@gmail.com>
> wrote:
> >>>> Michael,
> >>>>
> >>>> I only have the contract for the Reuters corpus I use and it
> >>>> specifically prohibits use for anything other than educational or
> >>>> research wise.  Commercial applications violate the copyright and
> >>>> contract terms.  I'm sure many of the others are similar.  This
> includes
> >>>> any trained models.
> >>>>
> >>>> James
> >>>>
> >>>> On 6/11/2012 1:45 PM, Michael Schmitz wrote:
> >>>>> Are you sure the copyright applies to your trained model?  Do you
> have
> >>>>> any information about the corpuses you used to train the models?
> >>>>>
> >>>>> Peace.  Michael
> >>>>>
> >>>>>
> >>>>> On Sat, Jun 9, 2012 at 3:44 PM, James Kosin <james.kosin@gmail.com>
> wrote:
> >>>>>> Michael,
> >>>>>>
> >>>>>> It is one of the things we are working on.  The problem is most
if
> not
> >>>>>> all the models are currently trained on copyrighted material
that
> >>>>>> restricts the usage of the resulting trained data to research
> purposes ONLY.
> >>>>>> We currently host the models on another site; due to this
> limitation and
> >>>>>> the licensing conflict that would result if we tried to host
on
> Apache.
> >>>>>>
> >>>>>> You are more than welcome to help, if you choose.
> >>>>>>
> >>>>>> James
> >>>>>>
> >>>>>> On 6/8/2012 6:55 PM, Michael Schmitz wrote:
> >>>>>>> Hi, is there any interest in hosting the stock OpenNLP models
in
> Maven
> >>>>>>> Central?  I know that OpenNLP intends for users to train
models on
> >>>>>>> their particular corpus, but often it's useful to get started
with
> the
> >>>>>>> stock models.
> >>>>>>>
> >>>>>>> I'm developing a common interface to some NLP toolkits in
Scala and
> >>>>>>> would like to include OpenNLP.  I would like to use OpenNLP
and
> have
> >>>>>>> use the stock models by default as a maven dependency. 
If I do
> this,
> >>>>>>> then I don't need to include the models with my artifact
and I
> don't
> >>>>>>> need to keep the models in my git repository.  More importantly,
> users
> >>>>>>> can exclude the stock models if they wish.
> >>>>>>>
> >>>>>>> What do you think?
> >>>>>>>
> >>>>>>> Peace.  Michael
> >>>>
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message