hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elaine Gan <elaine-...@gmo.jp>
Subject Re: Is mahout kmeans slow ?
Date Thu, 13 Sep 2012 02:27:09 GMT
Hi,

Sorry, i sent to the wrong ML.
Please ignore this.

Thank you.

> Hi,
> 
> I'm trying to do some text analysis using mahout kmeans (clustering),
> processing the data on hadoop.
> --numClusters = 160 
> --maxIter (-x) maxIter = 200
> 
> Well my data is small, around 500MB .
> I have 4 servers, each with 4CPU and TaskTrackers are set to 4 as
> maximum.
> When i run the mahout task, i can see that the number of map tasks are
> the most 3, so i guess i do not need to do any tuning on this at this
> moment.
> 
> One iteration took around 1.5mins ~ 2mins to finish.
> I am not sure whether this is normal or is it consider slow, can anyone
> gives me an advice on this?
> 
> And with x = 200, it tooks me around 200x2mins = 6 hours 
> to finish the whole analysis..
> Is it something which is unavoided?
> The bigger the "x" is, the longer time it takes to finish the kmeans job?
> 
> Any ways to improve on the mahout kmeans to speed it up?
> 
> Thank you.
> 
> 


Mime
View raw message