mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (MAHOUT-1493) Port Naive Bayes to the Spark DSL
Date Sun, 05 Apr 2015 03:36:33 GMT


ASF GitHub Bot commented on MAHOUT-1493:

GitHub user andrewpalumbo opened a pull request:

    MAHOUT-1493 parallelize SparkNaiveBayes.test(...)

    Explicitly define math-scala NaiveBayes.test(...) as sequential and in memory.  Extend
test(..) into SparkNaiveBayes and distribute the classification process. Also some general

You can merge this pull request into a Git repository by running:

    $ git pull MAHOUT-1493-serialize

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #104
commit 98fc94484a408ecc9e433babaee772258ba9bae9
Author: Andrew Palumbo <>
Date:   2015-04-05T01:27:47Z

    add a prefix directory for model.dfsWrite(..) to write components. added tests for full
training and testing of seeded random toy TFIDF data

commit 1049b461464ab27a56f742d96c5b5c73b040ec1b
Author: Andrew Palumbo <>
Date:   2015-04-05T01:33:23Z

    use SparkNaiveBayes rather than NaiveBayes in CLI drivers to avoid confusion

commit 6c5dc8f3359255f85fbbbe2ffb24681fa489255d
Author: Andrew Palumbo <>
Date:   2015-04-05T02:40:44Z

    override NaiveBayes.test in Spark and broadcast the classifier to the closure.  Now is
no longer pulling everything into memory up frot

commit 2522a029ecbf6389f09ab69f80f05a298bcd7ea0
Author: Andrew Palumbo <>
Date:   2015-04-05T02:55:08Z

    Make math-scala NaiveBayes.test(...) explictly sequential

commit bad6d9a06e3e04cfdafcce7d1adc2066ecd991b7
Author: Andrew Palumbo <>
Date:   2015-04-05T03:29:36Z



> Port Naive Bayes to the Spark DSL
> ---------------------------------
>                 Key: MAHOUT-1493
>                 URL:
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>            Reporter: Sebastian Schelter
>            Assignee: Andrew Palumbo
>              Labels: DSL, h2o, scala
>             Fix For: 0.10.0
>         Attachments: MAHOUT-1493.patch, MAHOUT-1493.patch, MAHOUT-1493.patch, MAHOUT-1493.patch,
> Port our Naive Bayes implementation to the new spark dsl. Shouldn't require more than
a few lines of code.

This message was sent by Atlassian JIRA

View raw message