mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paritosh Ranjan (Commented) (JIRA)" <>
Subject [jira] [Commented] (MAHOUT-931) Implement a pluggable outlier removal capability for cluster classifiers
Date Tue, 27 Dec 2011 16:44:30 GMT


Paritosh Ranjan commented on MAHOUT-931:

I am a bit confused.

Are we planning to get rid of the way clustering is being done currently, which is algorithms
specific? i.e. the code in CanopyClusterer.
Will the new clustering strategy be "only" what is implemented in ClusterClassifier? i.e.
Calculating probabilities of vectors belonging to different models (clusters) and choose the
model with highest probability?

If yes, then Implementing Clustering policy for different clustering algorithms is all that
is needed. And for outlier removal, just a threshold probability will be needed. All vectors
below that probability won't be clustered. Am I correct?

Till now, I have been thinking that the clustering code just needs to be refactored out (
without changing the implementation ). If this is the case, then, I think, I have been proceeding
in the correct direction ( in terms of design ). 

However, I am doubting that we are not in sync regarding the way of implementation. I think
you want to change the clustering implementation to a cluster classification implementation,
with outlier removal ( and completely get rid of the algorithm specific implementation, which
makes sense ). 

So, it would be really helpful if you can clarify my doubts.

> Implement a pluggable outlier removal capability for cluster classifiers
> ------------------------------------------------------------------------
>                 Key: MAHOUT-931
>                 URL:
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification, Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>             Fix For: 0.7
>         Attachments: MAHOUT-931
> A pluggable outlier removal capability while classifying the clusters is needed. The
classification and outlier removal implementations, both should be completely separate entities
for better abstraction. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message