Hi,
Sorry, i sent to the wrong ML.
Please ignore this.
Thank you.
> Hi,
>
> I'm trying to do some text analysis using mahout kmeans (clustering),
> processing the data on hadoop.
> --numClusters = 160
> --maxIter (-x) maxIter = 200
>
> Well my data is small, around 500MB .
> I have 4 servers, each with 4CPU and TaskTrackers are set to 4 as
> maximum.
> When i run the mahout task, i can see that the number of map tasks are
> the most 3, so i guess i do not need to do any tuning on this at this
> moment.
>
> One iteration took around 1.5mins ~ 2mins to finish.
> I am not sure whether this is normal or is it consider slow, can anyone
> gives me an advice on this?
>
> And with x = 200, it tooks me around 200x2mins = 6 hours
> to finish the whole analysis..
> Is it something which is unavoided?
> The bigger the "x" is, the longer time it takes to finish the kmeans job?
>
> Any ways to improve on the mahout kmeans to speed it up?
>
> Thank you.
>
>
|