mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paritosh Ranjan (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-843) Top Down Clustering
Date Mon, 05 Dec 2011 15:52:40 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162840#comment-13162840
] 

Paritosh Ranjan commented on MAHOUT-843:
----------------------------------------

Hi Jeff,

I have started adding the content in wiki page https://cwiki.apache.org/MAHOUT/top-down-clustering.html.
Is the location fine?

The execution mode you are talking about is the sequential/mapreduce version. Correct? If
yes, then there is a paramter for that. The sequential or the mapreduce version is executed
based on that.

Do I need to fix the KMeansTests and the Javadocs? If yes, would you like to specify some
methods where the javadocs are not good, or, I will just have a second look on them and try
to make them better. Can you also provide the class names where the tests are failing ( because
I skip tests to build Mahout, as I use windows).
                
> Top Down Clustering
> -------------------
>
>                 Key: MAHOUT-843
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-843
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Jeff Eastman
>              Labels: clustering, patch
>             Fix For: 0.6
>
>         Attachments: MAHOUT-843-patch, MAHOUT-843-patch-only-postprocessor, MAHOUT-843-patch-only-postprocessor-v1,
MAHOUT-843-patch-only-postprocessor-v2, MAHOUT-843-patch-only-postprocessor-v3, MAHOUT-843-patch-only-postprocessor-v4,
MAHOUT-843-patch-only-postprocessor-v5, MAHOUT-843-patch-v1, Top-Down-Clustering-patch
>
>
> Top Down Clustering works in multiple steps. The first step is to find comparative bigger
clusters. The second step is to cluster the bigger chunks into meaningful clusters. This can
performance while clustering big amount of data. And, it also removes the dependency of providing
input clusters/numbers to the clustering algorithm.
> The "big" is a relative term, as well as the smaller "meaningful" terms. So, the control
of this "bigger" and "smaller/meaningful" clusters will be controlled by the user.
> Which clustering algorithm to be used in the top level and which to use in the bottom
level can also be selected by the user. Initially, it can be done for only one/few clustering
algorithms, and later, option can be provided to use all the algorithms ( which suits the
case ). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message