mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <>
Subject [jira] Commented: (MAHOUT-92) BayesFeatureMapper doesn't properly extract features
Date Sat, 01 Nov 2008 12:39:44 GMT


Grant Ingersoll commented on MAHOUT-92:

Cool, this looks almost exactly like the patch I came up with based on looking at some of
the old patches.

Why is encoding and analyzer a required option in the command line?

In my original patch, I believe it started off by tokenizing/filtering the text using any
specified Lucene Analyzer.  I think this piece would be useful to restore.  This way, you
aren't just relying on a simple whitespace tokenizer and can plug in your own very easily.

The same goes for the default category. The classifier returns the first category if all the
categories have same score or zero. I don't see any problem in that.

The default category covered the case where there isn't sufficient evidence for a category.

> BayesFeatureMapper doesn't properly extract features
> ----------------------------------------------------
>                 Key: MAHOUT-92
>                 URL:
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>         Attachments: MAHOUT-92.patch
> The BayesFeatureMapper currently has a bunch of unused variables and doesn't actually
do anything.  The problem is it is not using the input value to generate a set of n-grams,
from which it can then generate tf-idf information.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message