mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paritosh Ranjan (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAHOUT-929) Refactor Clustering (Vector Classification) into a Separate Postprocess with Outlier Pruning
Date Sat, 18 Feb 2012 15:15:00 GMT

     [ https://issues.apache.org/jira/browse/MAHOUT-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Paritosh Ranjan updated MAHOUT-929:
-----------------------------------

    Attachment: Mahout-929

I have added emitMostLikely feature to vector classification. If clusterClassificationThreshold
is present, then only vectors whose pdf's are greater than clusterClassificationThreshold
would be classified. Its a bit different than the previous implementation, but makes more
sense if you think in terms of outlier removal.

So, even Dirichlet and FuzzyKMeans can be classified now.

The patch only contains changes and test cases for the sequential version for now. I will
make changes to mapreduce version with test cases and submit soon.
                
> Refactor Clustering (Vector Classification) into a Separate Postprocess with Outlier
Pruning
> --------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-929
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-929
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification, Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Jeff Eastman
>             Fix For: 0.7
>
>         Attachments: Mahout-929, Mahout-929, Mahout-929, Mahout-929
>
>
> The current clustering drivers have a -cp option to produce clusteredPoints directory
containing the input vectors classified by the final clusters produced by the algorithm. These
options are redundantly implemented in those drivers.
> - Factor out & implement an independent post processor to perform the classification
step independently of the various clustering implementations.
> - Implement a pluggable outlier removal capability for this classifier. 
> - Consider building off of the ClusterClassifier & ClusterIterator ideas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message