flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chiwan Park <chiwanp...@apache.org>
Subject Re: How to prepare data for K means clustering
Date Thu, 21 Jan 2016 07:21:17 GMT
Hi Ashutosh,

You can use basic Flink DataSet operations such as map and filter to transform your data.
Basically, you have to declare a distance metric between each record in data. In example,
we use euclidean distance (see euclideanDistance method in Point class).

In map method in SelectNearestCenter class, euclideanDistance method is used to measure the
distance between each point. For your implementation, you have to substitute type to your
data type (It can be your custom class or Flink-provided Tuple) and change distance metric
for your data.

Regards,
Chiwan Park

> On Jan 21, 2016, at 4:14 PM, Ashutosh Kumar <ashutosh.discuss@gmail.com> wrote:
> 
> I saw example code for K means clustering . It takes input  data points as pair of double
values (1.2 2.3\n5.3 7.2\.). My question is how do I convert my business data to this format.
I have customer data which has columns like house hold income , education and several others.
I want to do clustering on multiple columns something like Neilsen segments. 
> 
> Thanks
> Ashutosh


Mime
View raw message