opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <kottm...@gmail.com>
Subject Re: Getting our first release out
Date Tue, 01 Feb 2011 22:05:58 GMT
On 2/1/11 10:45 PM, Grant Ingersoll wrote:
> Yes, we should start assembling a list of corpora, even so we at least have it for others
that come later and want to reproduce them.  In the meantime, I would agree that we can just
keep the models elsewhere.  We don't have to provide models.  They are a convenience for all
involved, but not a requirement in order to run.  I wonder how many people actually train
there own.  (BTW, we should update our website to point to older models, too.  They are really
hard to find unless you do some URL rewriting.)

OK, then lets get out the release as quickly as possible without 
depending on the legal issues for the models
And lets do as much as possible to resolve these issues, just next to 
the release work. I might have a
few spare cycles here and there to work on that.

To get started with the legal stuff we need to compile a list with all 
the necessary information,
that list will make a nice corpora page in our wiki.
Our documentation already contains instructions on how to train on some 
freely available data.

In the end I believe we are all best served with a wikinews corpus which 
can be labeled by our community.

Jörn

Mime
View raw message