mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-92) BayesFeatureMapper doesn't properly extract features
Date Sat, 01 Nov 2008 20:07:44 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644534#action_12644534
] 

Grant Ingersoll commented on MAHOUT-92:
---------------------------------------

{quote}
I also ran a train and a test over 20 newsgroups. Everything seems working at the moment.
{quote}

Can you share how you are running it?  When I run it, it completes, but all the results are
"unknown".  Please update http://cwiki.apache.org/confluence/display/MAHOUT/TwentyNewsgroups
if you have the time.

I'm looking at the BayesClassifier class, and I frankly don't get how it works anymore, especially
the code at:
{code}
for (String category : categories) {
      double prob = documentProbability(model, category, document);
      if (prob < min) {
        min = prob;
        result.setLabel(category);
      }
    }
{code}

That min value starts at 0, and a probability should be between 0 and 1, how would that clause
ever be satisfied such that the label gets set?  Additionally, the values that come back for
prob are much larger than one.  That's fine if they are supposed to be, but then we shouldn't
be calling it a probability.


> BayesFeatureMapper doesn't properly extract features
> ----------------------------------------------------
>
>                 Key: MAHOUT-92
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-92
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: MAHOUT-92.patch, MAHOUT-92.patch
>
>
> The BayesFeatureMapper currently has a bunch of unused variables and doesn't actually
do anything.  The problem is it is not using the input value to generate a set of n-grams,
from which it can then generate tf-idf information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message