hivemall-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From helenahm <...@git.apache.org>
Subject [GitHub] incubator-hivemall issue #93: Maximum Entropy Model
Date Wed, 05 Jul 2017 01:26:48 GMT
Github user helenahm commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/93
  
    In Hivemall-126: 
    
    Max Entropy Classifier (a.k.a. Multi-nominal/Multiclass Logistic Regression) [1,2] is
useful for Text classification.
    
    Max Entropy Classifier is more often used for Part-of-Speech Tagging and Named Entity
Recognition, and some other tasks where context is used as features. Those are also fundamental
tasks of NLP. Even though Text Classification is a candidate too.
    
    Mohri as his colleagues also put POS task first. As Mohri writes in the article that is
a basis for the implementation I have chosen:
    
    Our first set of experiments were carried out with “medium” scale data sets containing
1M-300Minstances.
    These included: English part-of-speech tagging, generated from the Penn Treebank
    [16] using the first character of each part-of-speech tag as output, sections 2-21 for
training, section
    23 for testing and a feature representation based on the identity, affixes, and orthography
of the input
    word and the words in a window of size two; Sentiment analysis, generated from a set of
    online product, service, and merchant reviews with a three-label output (positive, negative,
neutral),
    with a bag of words feature representation; RCV1-v2 as described by [14], where documents
having
    multiple labels were included multiple times, once for each label; Acoustic Speech Data,
a 39-
    dimensional input consisting of 13 PLP coefficients, plus their first and second derivatives,
and 129
    outputs (43 phones × 3 acoustic states); and the Deja News Archive, a text topic classification
    problem generated from a collection of Usenet discussion forums from the years 1995-2000.
For all
    text experiments, we used random feature mixing [9, 20] to control the size of the feature
space.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message