opennlp-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <kottm...@gmail.com>
Subject Re: Document Classification
Date Tue, 24 Apr 2012 07:26:51 GMT
What are you planning to add?

Jörn

On 04/24/2012 03:53 AM, Jason Baldridge wrote:
> FWIW, there will be more classification capabilities coming in the next
> several months.
>
> -Jason
>
> On Mon, Apr 23, 2012 at 5:12 PM, Jörn Kottmann<kottmann@gmail.com>  wrote:
>
>> OpenNLP is using either a Maxent or Perceptron classifier
>> to classify a piece of text. This can give you back the provabilities
>> for the various categories, but its not designed to tell you how
>> much each topic is represented in your input document.
>>
>> You could take a document and assume each paragraph has one topic
>> and then classify it paragraph by paragraph.
>> We sadly don't have support for topic models, such as LDA.
>>
>> All the training logs are still written to the console, we have plans
>> to properly capture them and report training process back via an
>> API. This output should then be logged and maybe just stored in inside
>> the model for later debugging.
>>
>> Jörn
>>
>>
>> On 04/23/2012 07:41 PM, Alex Kudlick wrote:
>>
>>> Hi,
>>>
>>> I've just started using open nlp for a project to classify scientific
>>> articles in to subjects.  I have a few questions:
>>>
>>> 1. How do I configure logging for the model? I'm using sf4j-log4j for the
>>> rest of my application, but the training output from the model just goes
>>> to
>>> stdout.
>>>
>>> 2. Is there any support for classifying documents with multiple classes?
>>> For instance, a given article may be classified as Computational Biology,
>>> Cell Biology, and Molecular Biology.
>>>
>>> Thanks,
>>>
>>> Alex Kudlick
>>>
>>>
>


Mime
View raw message