mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maxim Arap (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1468) Creating a new page for StreamingKMeans documentation on mahout website
Date Wed, 23 Apr 2014 15:57:19 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978348#comment-13978348
] 

Maxim Arap commented on MAHOUT-1468:
------------------------------------

I have a question regarding the update rule for distanceCutoff in clusterInternal function
in StreamingKMeans.java.

The default initial value for distanceCutoff is 1.0 / numClusters, where numClusters is the
initial value for 
the expected number of clusters that the streaming step will output. Say, nC0 is the initial
value of numClusters.
As the algorithm runs, numClusters will grow, but distanceCutoff grows as beta^r / nC0. In
other words, distanceCutoff
still uses the initial value of numClusters. It seems natural to update the "denominator"
in distanceCutoff
each time the value of numClusters changes. What are your thoughts on this? 

> Creating a new page for StreamingKMeans documentation on mahout website
> -----------------------------------------------------------------------
>
>                 Key: MAHOUT-1468
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1468
>             Project: Mahout
>          Issue Type: Documentation
>          Components: Documentation
>    Affects Versions: 1.0
>            Reporter: Pavan Kumar N
>            Assignee: Andrew Musselman
>              Labels: Documentation
>             Fix For: 1.0
>
>         Attachments: StreamingKMeans.txt
>
>
> Separate page required on Streaming K Means algorithm description and overview, explaining
the various parameters can be used in streamingkmeans, strategy for parallelization, link
to this paper: http://papers.nips.cc/paper/3812-streaming-k-means-approximation.pdf



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message