opennlp-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joern Kottmann <kottm...@gmail.com>
Subject Re: OpenNLP Sentence Detector: EOS Characters
Date Thu, 09 Feb 2012 08:18:10 GMT
Did you modify the evaluation as well? If you just do it during training the
evaluator will not be able to consider ":" as en EOS character.

For me it sounds like that it fails to split on the ":" in some place.

The sentence detector uses a maxent model to classify every EOS character
as either a SPLIT or NO_SPLIT.

Jörn

On Thu, Feb 9, 2012 at 8:59 AM, Katrin Tomanek
<katrin.tomanek@averbis.com>wrote:

> Hi Willian,
>
> I am currently using opennlp-1.5.2 and try to use it as an API, i.e. not
> to modify this code by write my own code around it. However, what I
> described below (with the SDEventStream) results in the same as you are
> describing: I am changing the set of EOS characters.
>
> I am just wondering, why adding ":" as an EOS character decreases the
> results (dropping von ~80F to 45F in sentence splitting, and ":" is always
> a sentence boundary symbol in my data!)
>
> Looks like I need to debug a little bit more whats happening in the
> DefaultSDContextGenerator.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message