spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sameer Tilak <ssti...@live.com>
Subject MLLib: LinearRegressionWithSGD performance
Date Fri, 21 Nov 2014 19:18:32 GMT
Hi All,I have been using MLLib's linear regression and I have some question regarding the performance.
We have a cluster of 10 nodes -- each node has 24 cores and 148GB memory. I am running my
app as follows:
time spark-submit --class medslogistic.MedsLogistic --master yarn-client --executor-memory
6G --num-executors 10 /pathtomyapp/myapp.jar
I am also going to play with number of executors (reduce it) may be that will give us different
results.  
The input is a 800MB sparse file in LibSVNM format. Total number of features is 150K. It takes
approximately 70 minutes for the regression to finish. The job imposes very little load on
CPU, memory, network, and disk. Total number of tasks is 104.  Total time gets divided fairly
uniformly across these tasks each task. I was wondering, is it possible to reduce the execution
time further?  		 	   		  
Mime
View raw message