gearpump-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kam Kasravi (JIRA)" <>
Subject [jira] [Commented] (GEARPUMP-55) Add kmeans example
Date Wed, 20 Apr 2016 12:26:25 GMT


Kam Kasravi commented on GEARPUMP-55:

Imported from [#2006|]

> Add kmeans example
> ------------------
>                 Key: GEARPUMP-55
>                 URL:
>             Project: Apache Gearpump
>          Issue Type: New Feature
>          Components: examples
>    Affects Versions: 0.8.0
>            Reporter: Kam Kasravi
>            Priority: Minor
>             Fix For: 0.8.1
> From [pangolulu|]
> There is a document about streaming kmeans in Spark (,
I think we can try to implement it on Gearpump. Here is my processor topology on Gearpump:
> !!
> The `Source Processor` will produce points by time, then broadcast the point to the `Distribution
Processor`. The number of tasks of the `Distribution Processor` is k, where each task save
one center and the corresponding points. When `Distribution Processor` receives a point from
`Source Processor`, it will calculate the distance of this point to its center, and then send
the distance along with the point and its `taskId` to the `Collection Processor`. When `Collection
Processor` receives the distance from `Distribution Processor`, it will accumulate the number
of current points, determine if it's time to update center, choose the smallest distance and
then send the point along with its corresponding `Distribution Processor` taskId by broadcast
partitioner. When `Distribution Processor` receives the result message, task with the corresponding
`taskId` will accumulate the point. If `Distribution Processor` receives that it's time to
update center, then all the tasks will update its corresponding center.
> This procedure is streaming and the center of cluster will change by time.

This message was sent by Atlassian JIRA

View raw message