opennlp-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Baldridge <jasonbaldri...@gmail.com>
Subject Re: Document Classification
Date Tue, 24 Apr 2012 01:53:14 GMT
FWIW, there will be more classification capabilities coming in the next
several months.

-Jason

On Mon, Apr 23, 2012 at 5:12 PM, Jörn Kottmann <kottmann@gmail.com> wrote:

> OpenNLP is using either a Maxent or Perceptron classifier
> to classify a piece of text. This can give you back the provabilities
> for the various categories, but its not designed to tell you how
> much each topic is represented in your input document.
>
> You could take a document and assume each paragraph has one topic
> and then classify it paragraph by paragraph.
> We sadly don't have support for topic models, such as LDA.
>
> All the training logs are still written to the console, we have plans
> to properly capture them and report training process back via an
> API. This output should then be logged and maybe just stored in inside
> the model for later debugging.
>
> Jörn
>
>
> On 04/23/2012 07:41 PM, Alex Kudlick wrote:
>
>> Hi,
>>
>> I've just started using open nlp for a project to classify scientific
>> articles in to subjects.  I have a few questions:
>>
>> 1. How do I configure logging for the model? I'm using sf4j-log4j for the
>> rest of my application, but the training output from the model just goes
>> to
>> stdout.
>>
>> 2. Is there any support for classifying documents with multiple classes?
>> For instance, a given article may be classified as Computational Biology,
>> Cell Biology, and Molecular Biology.
>>
>> Thanks,
>>
>> Alex Kudlick
>>
>>
>


-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message