mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Is mahout kmeans slow ?
Date Thu, 13 Sep 2012 04:14:30 GMT
Also, with 500MB of data, this is likely to only take a few minutes on a
single machine with the new clustering stuff.  It is hard to estimate
precisely, however, due to the difference between dense and sparse cases.

On Wed, Sep 12, 2012 at 8:42 PM, Pat Ferrel <pat.ferrel@gmail.com> wrote:

> 200 iterations?
>
> What is your convergence delta? If it is too small for your distance
> measure you will perform all 200 iterations, every time you cluster.
>
>   --convergenceDelta (-cd) convergenceDelta
>           The convergence delta value.
>            Default is 0.5
>
> I would set the convergence delta looser and see if 100 or even 20
> iterations produces good results. You can always tweak your other
> parameters to get them tuned and up your convergence if needed. Also
> remember that a good convergence is related to your distance measure so you
> need to think about which distance measure works for your data.
>
> I generally only take 10-20 iterations using cosine distance and 0.001 as
> the convergence delta, which would be 20-40 minutes for you.
>
> On Sep 12, 2012, at 7:26 PM, Elaine Gan <elaine-gan@gmo.jp> wrote:
>
> Hi,
>
> I'm trying to do some text analysis using mahout kmeans (clustering),
> processing the data on hadoop.
> --numClusters = 160
> --maxIter (-x) maxIter = 200
>
> Well my data is small, around 500MB .
> I have 4 servers, each with 4CPU and TaskTrackers are set to 4 as
> maximum.
> When i run the mahout task, i can see that the number of map tasks are
> the most 3, so i guess i do not need to do any tuning on this at this
> moment.
>
> One iteration took around 1.5mins ~ 2mins to finish.
> I am not sure whether this is normal or is it consider slow, can anyone
> gives me an advice on this?
>
> And with x = 200, it tooks me around 200x2mins = 6 hours
> to finish the whole analysis..
> Is it something which is unavoided?
> The bigger the "x" is, the longer time it takes to finish the kmeans job?
>
> Any ways to improve on the mahout kmeans to speed it up?
>
> Thank you.
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message