opennlp-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Baldridge <>
Subject Re: Document Classification
Date Tue, 24 Apr 2012 01:53:14 GMT
FWIW, there will be more classification capabilities coming in the next
several months.


On Mon, Apr 23, 2012 at 5:12 PM, Jörn Kottmann <> wrote:

> OpenNLP is using either a Maxent or Perceptron classifier
> to classify a piece of text. This can give you back the provabilities
> for the various categories, but its not designed to tell you how
> much each topic is represented in your input document.
> You could take a document and assume each paragraph has one topic
> and then classify it paragraph by paragraph.
> We sadly don't have support for topic models, such as LDA.
> All the training logs are still written to the console, we have plans
> to properly capture them and report training process back via an
> API. This output should then be logged and maybe just stored in inside
> the model for later debugging.
> Jörn
> On 04/23/2012 07:41 PM, Alex Kudlick wrote:
>> Hi,
>> I've just started using open nlp for a project to classify scientific
>> articles in to subjects.  I have a few questions:
>> 1. How do I configure logging for the model? I'm using sf4j-log4j for the
>> rest of my application, but the training output from the model just goes
>> to
>> stdout.
>> 2. Is there any support for classifying documents with multiple classes?
>> For instance, a given article may be classified as Computational Biology,
>> Cell Biology, and Molecular Biology.
>> Thanks,
>> Alex Kudlick

Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message