opennlp-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <>
Subject Re: Document Classification
Date Mon, 23 Apr 2012 22:12:15 GMT
OpenNLP is using either a Maxent or Perceptron classifier
to classify a piece of text. This can give you back the provabilities
for the various categories, but its not designed to tell you how
much each topic is represented in your input document.

You could take a document and assume each paragraph has one topic
and then classify it paragraph by paragraph.
We sadly don't have support for topic models, such as LDA.

All the training logs are still written to the console, we have plans
to properly capture them and report training process back via an
API. This output should then be logged and maybe just stored in inside
the model for later debugging.


On 04/23/2012 07:41 PM, Alex Kudlick wrote:
> Hi,
> I've just started using open nlp for a project to classify scientific
> articles in to subjects.  I have a few questions:
> 1. How do I configure logging for the model? I'm using sf4j-log4j for the
> rest of my application, but the training output from the model just goes to
> stdout.
> 2. Is there any support for classifying documents with multiple classes?
> For instance, a given article may be classified as Computational Biology,
> Cell Biology, and Molecular Biology.
> Thanks,
> Alex Kudlick

View raw message