opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <kottm...@gmail.com>
Subject Re: POSTagger Perceptron API
Date Thu, 12 Jan 2012 14:56:06 GMT
On 1/12/12 3:42 PM, Svetoslav Marinov wrote:
> Hi all,
>
> There is a Perceptron model for Swedish POS tagger. How does one call it with the API?
I checked the API pages as well as the documentation but there there is only reference to
the MaxEnt model:
>
> POSTaggerME tagger  = new POSTaggerME(model);
>
> So what is the method for using the Perceptron model?

The decision is made at training time, depending on the settings either
maxent or perceptron is used to train a model. The produced model can
be loaded with the code above and OpenNLP takes care to setup
everything behind the scene correctly.

We distribute a perceptron model for English.

For information about how to set the training algorithm please consult
our documentation:
http://incubator.apache.org/opennlp/documentation/1.5.2-incubating/manual/opennlp.html#tools.postagger.training


> I am also curious about the performance of the trained models. Is there any reference
to precision/recall? Can one get in touch with the people who have trained the models available?
>
> If one creates a new model (say for sentence detection or POS tagging with different
set of POS tags) can one upload it?
>

We currently don't have a way to share models or take care for the 
distribution, mostly for copyright/legal issues.
The way we think it should be fixed is to share open source training data.

Anyway, we have some instructions no how to train the POS tagger on 
various public corpora in our documentation.
I suggest that you take a look there:
http://incubator.apache.org/opennlp/documentation/1.5.2-incubating/manual/opennlp.html#tools.corpora

Hope that helps,
Jörn

Mime
View raw message