opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rupert Westenthaler <rupert.westentha...@gmail.com>
Subject POS Tagger Probability changes between OpenNLP 1.5.1 and 1.5.2
Date Mon, 21 May 2012 11:49:51 GMT
Hi,

While debugging why POS tags are recently ignored by the Apache
Stanbol Enhancer I noticed that the reason where that with openNLP
1.5.2 the probabilities returned by the POS tagger have changed.

Previously typical probabilities of POS tags where  > 0.9+ for most of
the tokens. Because of that a configuration that ignores POS tags <
0.8 looked like a reasonable default. However with OpenNLP 1.5.2
probabilities are much lowers. At first it looks even like 1.5.2
returns now the uncertainty ('1-{probability}') instead of the
probability, but after looking a little bit into the source this seams
also unlikely to me.

I have already searched the Documentation and recent Jira Issues, but
I could not find anything related.

As an example the results for an single Sentence analyzed using
OpenNLP 1.5.1 and 1.5.2.

Sentence:

    A nice travel to the biggest volcano of Mexico.

Tokens are as expected

With openNLP 1.5.1 I get the following top Sequence when calling
POSTaggerME#topKSequences(tokens):

-0.0011259470521596032 [DT, JJ, NN, TO, DT, JJS, NN, IN, NNP, .]

Detailed Probabilities:

[1.0, 1.0, 0.9999999952604672, 0.9999999999971082, 1.0,
0.9988748880601196, 0.9999999702598833, 1.0, 0.9999999999989716,
0.9999998327848956]

Switching to openNLP 1.5.2 results in

-30.89400016135042 [DT, JJ, NN, TO, DT, JJS, NN, IN, NNP, .]

Detailed Probabilities:

[0.05013598125548828, 0.053016102976047086, 0.04032588713661259,
0.03995389549856565, 0.04685198986899964, 0.03659501930208113,
0.04132356969119329, 0.06434037591280849, 0.046311143933396866,
0.04233395769746884]


Is this a Bug or an intentional change. If the later it would be great
if someone could provide a link to the documentation.

best
Rupert Westenthaler


p.s:

with OpenNLP 1.5.1 I refer to

    opennlp-tools-1.5.1-incubating.jar
    opennlp-maxent-3.0.1-incubating.jar

with OpenNLP 1.5.2 I refer to

    opennlp-tools-1.5.2-incubating.jar
    opennlp-maxent-3.0.2-incubating.jar

In both cases the "en-pos-maxent.bin" as available via openly.sf.org is used

-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Mime
View raw message