mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-92) BayesFeatureMapper doesn't properly extract features
Date Sat, 01 Nov 2008 12:39:44 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644507#action_12644507
] 

Grant Ingersoll commented on MAHOUT-92:
---------------------------------------

Cool, this looks almost exactly like the patch I came up with based on looking at some of
the old patches.

{code}
Why is encoding and analyzer a required option in the command line?
{code}

In my original patch, I believe it started off by tokenizing/filtering the text using any
specified Lucene Analyzer.  I think this piece would be useful to restore.  This way, you
aren't just relying on a simple whitespace tokenizer and can plug in your own very easily.

{code}
The same goes for the default category. The classifier returns the first category if all the
categories have same score or zero. I don't see any problem in that.
{code}

The default category covered the case where there isn't sufficient evidence for a category.

> BayesFeatureMapper doesn't properly extract features
> ----------------------------------------------------
>
>                 Key: MAHOUT-92
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-92
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: MAHOUT-92.patch
>
>
> The BayesFeatureMapper currently has a bunch of unused variables and doesn't actually
do anything.  The problem is it is not using the input value to generate a set of n-grams,
from which it can then generate tf-idf information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message