opennlp-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Baldridge <jasonbaldri...@gmail.com>
Subject Re: Document Classification
Date Tue, 24 Apr 2012 14:17:19 GMT
Naive Bayes, perceptron variants (incl passive agressive), faster training
for maxent, and a better overall architecture. These are things my students
and I are working on independently, and I will bring in to OpenNLP when
time frees up to do so.

On Tue, Apr 24, 2012 at 2:26 AM, Jörn Kottmann <kottmann@gmail.com> wrote:

> What are you planning to add?
>
> Jörn
>
>
> On 04/24/2012 03:53 AM, Jason Baldridge wrote:
>
>> FWIW, there will be more classification capabilities coming in the next
>> several months.
>>
>> -Jason
>>
>> On Mon, Apr 23, 2012 at 5:12 PM, Jörn Kottmann<kottmann@gmail.com>
>>  wrote:
>>
>>  OpenNLP is using either a Maxent or Perceptron classifier
>>> to classify a piece of text. This can give you back the provabilities
>>> for the various categories, but its not designed to tell you how
>>> much each topic is represented in your input document.
>>>
>>> You could take a document and assume each paragraph has one topic
>>> and then classify it paragraph by paragraph.
>>> We sadly don't have support for topic models, such as LDA.
>>>
>>> All the training logs are still written to the console, we have plans
>>> to properly capture them and report training process back via an
>>> API. This output should then be logged and maybe just stored in inside
>>> the model for later debugging.
>>>
>>> Jörn
>>>
>>>
>>> On 04/23/2012 07:41 PM, Alex Kudlick wrote:
>>>
>>>  Hi,
>>>>
>>>> I've just started using open nlp for a project to classify scientific
>>>> articles in to subjects.  I have a few questions:
>>>>
>>>> 1. How do I configure logging for the model? I'm using sf4j-log4j for
>>>> the
>>>> rest of my application, but the training output from the model just goes
>>>> to
>>>> stdout.
>>>>
>>>> 2. Is there any support for classifying documents with multiple classes?
>>>> For instance, a given article may be classified as Computational
>>>> Biology,
>>>> Cell Biology, and Molecular Biology.
>>>>
>>>> Thanks,
>>>>
>>>> Alex Kudlick
>>>>
>>>>
>>>>
>>
>


-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message