spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Meethu Mathew <meethu.mat...@flytxt.com>
Subject Re: Gaussian Mixture Model clustering
Date Fri, 19 Sep 2014 05:38:12 GMT
Hi all,
Please find attached the image of benchmark results. The table in the 
previous mail got messed up. Thanks.



On Friday 19 September 2014 10:55 AM, Meethu Mathew wrote:
> Hi all,
>
> We have come up with an initial distributed implementation of Gaussian
> Mixture Model in pyspark where the parameters are estimated using the
> Expectation-Maximization algorithm.Our current implementation considers
> diagonal covariance matrix for each component.
> We did an initial benchmark study on a 2 node Spark standalone cluster
> setup where each node config is 8 Cores,8 GB RAM, the spark version used
> is 1.0.0. We also evaluated python version of k-means available in spark
> on the same datasets.Below are the results from this benchmark study.
> The reported stats are average from 10 runs.Tests were done on multiple
> datasets with varying number of features and instances.
>
>
>            Dataset 	      Gaussian mixture model
> 	               Kmeans(Python)
>
> Instances 	Dimensions 	Avg time per iteration 	Time for 100 iterations
> 	Avg time per iteration 	Time for 100 iterations
> 0.7million 	13
> 	7s
> 	12min
> 	  13s 	26min
> 1.8million 	11
> 	17s
> 	 29min 	   33s
> 	 53min
> 10 million 	16
> 	1.6min 	2.7hr
> 	  1.2min 	2 hr
>
>
> We are interested in contributing this implementation as a patch to
> SPARK. Does MLLib accept python implementations? If not, can we
> contribute to the pyspark component
> I have created a JIRA for the same
> https://issues.apache.org/jira/browse/SPARK-3588 .How do I get the
> ticket assigned to myself?
>
> Please review and suggest how to take this forward.
>
>
>

-- 

Regards,

*Meethu Mathew*

*Engineer*

*Flytxt*

Skype: meethu.mathew7

  F: +91 471.2700202

www.flytxt.com | Visit our blog <http://blog.flytxt.com/> | Follow us 
<http://www.twitter.com/flytxt> | _Connect on Linkedin 
<http://www.linkedin.com/home?trk=hb_tab_home_top>_


Mime
View raw message