mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robin Anil (JIRA)" <>
Subject [jira] Updated: (MAHOUT-92) BayesFeatureMapper doesn't properly extract features
Date Sat, 01 Nov 2008 06:15:44 GMT


Robin Anil updated MAHOUT-92:

    Attachment: MAHOUT-92.patch

to test
 hadoop jar build/apache-mahout-examples-0.1-dev.job org.apache.mahout.classifier.bayes.TestClassifier
-p 20newsmodel -t ../core/work/20news-18828-collapse/ -ng 1 -type cbayes -a org.apache.lucene.analysis.standard.StandardAnalyzer
-d default -e UTF-8

Some lines were missing in very last patch I submitted in MAHOUT-60. BayesFeatureMapper wasnt
creating any output. 
This patch fixes that. I also ran a train and a test over 20 newsgroups. Everything seems
working at the moment.

Why is encoding and analyzer a required option in the command line?
 I feel it should be optional. 

The same goes for the default category. The classifier returns the first category if all the
categories have same score or zero.  I don't see any problem in that.

 Any thoughts?

> BayesFeatureMapper doesn't properly extract features
> ----------------------------------------------------
>                 Key: MAHOUT-92
>                 URL:
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>         Attachments: MAHOUT-92.patch
> The BayesFeatureMapper currently has a bunch of unused variables and doesn't actually
do anything.  The problem is it is not using the input value to generate a set of n-grams,
from which it can then generate tf-idf information.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message