mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: AW: Incremental clustering
Date Thu, 12 May 2011 17:15:47 GMT
I think that this may also have to do with whether k-means retains a sense
of weight for the old clusters.  I don't think it currently does.

On Thu, May 12, 2011 at 10:09 AM, David Saile <david@uni-koblenz.de> wrote:

> I had that same thought, so I actually tried running k-Means twice on the
> Reuters dataset (as described in Quickstart).
> The second run received the resulting cluster of the first run as input.
>
> However, the execution times of the two runs did not differ much (actually
> the 2nd run was a bit slower).
> I also tried to double the input or the number of iterations, but no
> improvement.
>
> Could this be caused by running Hadoop on a single machine?
> Or is the number of iterations with 20 (or 40) simply not high enough?
>
> David
>
>
> Am 12.05.2011 um 18:46 schrieb Jeff Eastman:
>
> > Also, if cluster training begins with the posterior from a previous
> training session over the corpus but with new data added since that training
> began, the prior clusters should be very close to an optimal solution with
> the new data and the number of iterations required to converge on a new
> posterior should be reduced. Haven't tried this in practice but it seems
> logical. Convergence is calculated by how much each cluster has changed
> during an iteration.
> >
> > -----Original Message-----
> > From: Benson Margulies [mailto:bimargulies@gmail.com]
> > Sent: Thursday, May 12, 2011 9:14 AM
> > To: user@mahout.apache.org
> > Subject: Re: AW: Incremental clustering
> >
> > Is the idea here that you are going to be presented with many
> > different corpora that have some sort of overall resemblance, so that
> > priors derived from the first N speed up clustering N+1?
> >
> > --benson
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message