mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1856) Create a framework for new Mahout Clustering, Classification, and Optimization Algorithms
Date Thu, 04 Aug 2016 15:51:20 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407985#comment-15407985
] 

ASF GitHub Bot commented on MAHOUT-1856:
----------------------------------------

GitHub user rawkintrevo opened a pull request:

    https://github.com/apache/mahout/pull/246

    [MAHOUT-1856][WIP] reate a framework for new Mahout Clustering, Classification, and Optimization
Algorithms 

    Relevant JIRA: [https://issues.apache.org/jira/browse/MAHOUT-1856](https://issues.apache.org/jira/browse/MAHOUT-1856)
    
    Readme.md provides a more comprehensive (yet still incomplete) overview.
    
    Key Points:
    Top Level Class: 
    Model has one method- fit, and coefs.
    
    Transformers map a vector input to a vector output (same or different length)
    Regressors map a vector input to a single output (e.g. a Double)
    Classifiers extend Transformers which have created a probability vector by 'selecting'
the class and returning the label (instead of the entire p-vector)
    
    Pipelines and Ensembles are models as well, except they are composed from other models
listed above, or from other pipelines and ensembles.
    
    ToDo:
    - [ ] All models need a uniform way to expose their tuning parameters -> this will
be required for a auto-tuning algo.  
    - [ ] Pipelines / Ensembles must be able to account and report the tunable paremeters
of their sub models
    - [ ] Need fitness functions
    - [ ] Native method wrappers- Underlying engines and third party packages have implementations
of many ML models, let's not recreate the wheel by exposing YET ANOTHER sgd algorithm. Instead
should be able to convert matrix to expected format of 'other' library, run model, get results,
package back into matrix and pass on in pipeline or ensemble. (This is especially useful for
DeepLearning4J integration). Also Native implementations on engine of some algos probably
more efficient by leveraging engine specific tricks (think Flink delta iterators) than implementations
we would make. 
    - [ ] Lots more, open for discussion. 
    
    This is merely a conversation started on what to do.  
    
    I've included OLS as an example regressor and a normalizer as an example transformer,
only for illustrative purposes.  I really don't want to pack to many algos in to this initial
commit, just an example/ proof of concept so we can say, yea- this framework makes sense for
this kind of model OR ooh, we probably want to have these features too. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rawkintrevo/mahout mahout-1856

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/mahout/pull/246.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #246
    
----
commit 6c0f6bd322a50341bcc587750146467f9ff3fa0a
Author: rawkintrevo <trevor.d.grant@gmail.com>
Date:   2016-08-01T00:08:16Z

    [MAHOUT-1856] ML Algo Framework

commit 1f04cd5436df12ded23b8a1815b93ce73ea2a32a
Author: rawkintrevo <trevor.d.grant@gmail.com>
Date:   2016-08-02T17:22:48Z

    Building framework

commit 33b90c9795bbb1ff381a98045b0d5f2b641693a9
Author: rawkintrevo <trevor.d.grant@gmail.com>
Date:   2016-08-02T23:09:30Z

    add placeholders for ensemble pipeline and fitness test

commit 83c6068e2aa18a62f6ae8b84169a018f764ab408
Author: rawkintrevo <trevor.d.grant@gmail.com>
Date:   2016-08-03T14:54:32Z

    added readme

commit 52e9c3e1df4db1397ab81bf07c0e191cfd229b1a
Author: rawkintrevo <trevor.d.grant@gmail.com>
Date:   2016-08-03T14:58:59Z

    fixed readme image

commit 92ceeb9603ff9c4927214b896c4dbcfc63f8c7c4
Author: rawkintrevo <trevor.d.grant@gmail.com>
Date:   2016-08-03T15:04:11Z

    fixed readme image

commit c0b0464f45470375d709ef9475d474440411879f
Author: rawkintrevo <trevor.d.grant@gmail.com>
Date:   2016-08-03T15:04:52Z

    fixed readme image

commit 6f0228aa7ff349cd8ff5c10a4dafe55ec2037ee4
Author: rawkintrevo <trevor.d.grant@gmail.com>
Date:   2016-08-04T15:36:53Z

    removed autogen comments from files

commit 065fb24068e5e98b24f4f53ab8cb312abfb8b9ed
Author: rawkintrevo <trevor.d.grant@gmail.com>
Date:   2016-08-01T00:08:16Z

    [MAHOUT-1856] ML Algo Framework

commit 127d5dec29ac8b7d6ad3a12c494d4ccdae24cd31
Author: rawkintrevo <trevor.d.grant@gmail.com>
Date:   2016-08-02T17:22:48Z

    Building framework

commit 557af2ee7bec17b176c6def768ea6d3da8495b42
Author: rawkintrevo <trevor.d.grant@gmail.com>
Date:   2016-08-02T23:09:30Z

    add placeholders for ensemble pipeline and fitness test

commit bde4c940f3e540ffb2e8eceb87355638ca157f89
Author: rawkintrevo <trevor.d.grant@gmail.com>
Date:   2016-08-03T14:54:32Z

    added readme

commit 565a164082b3c00294db2a4bd1a0b001d561d6f9
Author: rawkintrevo <trevor.d.grant@gmail.com>
Date:   2016-08-03T14:58:59Z

    fixed readme image

commit 950027c047021c23f44af64b842bcbc1bbd717f9
Author: rawkintrevo <trevor.d.grant@gmail.com>
Date:   2016-08-03T15:04:11Z

    fixed readme image

commit 045192146e290d9762f09e4235dd4c2f947891d4
Author: rawkintrevo <trevor.d.grant@gmail.com>
Date:   2016-08-03T15:04:52Z

    fixed readme image

commit f65d7a941f666d0a58d56ac642558dd15fb57cd7
Author: rawkintrevo <trevor.d.grant@gmail.com>
Date:   2016-08-04T15:36:53Z

    removed autogen comments from files

commit 842db7ec3c21e5a4d1d152f1150b0dc97e5f44e7
Author: rawkintrevo <trevor.d.grant@gmail.com>
Date:   2016-08-04T15:38:19Z

    Merge branch 'mahout-1856' of https://github.com/rawkintrevo/mahout into mahout-1856

----


> Create a framework for new Mahout Clustering, Classification, and Optimization  Algorithms
> ------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1856
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1856
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.12.1
>            Reporter: Andrew Palumbo
>            Assignee: Trevor Grant
>            Priority: Critical
>             Fix For: 0.13.0
>
>
> To ensure that Mahout does not become "A loose bag of algorithms", Create basic traits
with funtions common to each class of algorithm. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message