mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Handerson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-60) Complementary Naive Bayes
Date Fri, 18 Jul 2008 19:48:31 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614847#action_12614847
] 

Steven Handerson commented on MAHOUT-60:
----------------------------------------

Robin,

I can get the training working very well -- I've even started working with a very 
large file (700+ Meg, and not done creating yet, the old slow way).  No problem.
But I'd say the judgment / application of a model maybe needs a better map-reduce
treatment now -- at least I *think* it's working (I've seen it work on smaller
training data) but with my larger task it's getting bogged down.

Maybe I'll think about it / try it, but I'm very new to map-reduce, but it seems
like you should be able to do something clever with throwing the
test data (feature|doc) and model data (feature|category, increment) together,
reducing and emitting category increments / decrements for each
(doc, category) pair, and then summing them up in a reduce.
Or just emitting (doc|category,increment) for all features, and then you
can easily also in the reduce find the maximal category.

I don't think this is what you're doing yet -- you're thinking of loading
the model, rather than shoving it through a map/reduce sequence.  I think.

> Complementary Naive Bayes
> -------------------------
>
>                 Key: MAHOUT-60
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-60
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Classification
>            Reporter: Robin Anil
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: MAHOUT-60.patch, MAHOUT-60.patch, MAHOUT-60.patch, twcnb.jpg
>
>
> The focus is to implement an improved text classifier based on this paper http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message