opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joern Kottmann <kottm...@gmail.com>
Subject Re: Question about OpenNLP and comparison to e.g., NTLK, Stanford NER, etc.
Date Thu, 12 Nov 2015 22:22:34 GMT
On Thu, 2015-11-12 at 15:43 +0000, Russ, Daniel (NIH/CIT) [E] wrote:
> 1) I use the old sourceforge models.  I find that the source of error
> in my analysis are usually not do to mistakes in sentence detection or
> POS tagging.  I don’t have the annotated data or the time/money to
> build custom models.  Yes, the text I analyze is quite different than
> the (WSJ? or what corpus was used to build the models), but it is good
> enough. 

That is interesting, wasn't aware of that those are still useful.

It really depends on the component as well, I was mostly thinking about
the name finder models when I wrote that.

Do you only use the Sentence Detector, Tokenizer and POS tagger?

You could use OntoNotes (almost for free) to train models. Maybe we
should look into distributing models trained on OntoNotes.

Jörn


Mime
View raw message