mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: boost selected dimensions in kmeans clustering
Date Thu, 15 Jan 2015 14:53:13 GMT
On Thu, Jan 15, 2015 at 5:23 AM, Miguel Angel Martin junquera <> wrote:

> My question is:..
>  Is it  better to scale up these dimensions  directly in the tf-idf
> sequence final mix  file using this correction factors  OR first do scale
> up   in each  tf-vectors and then mix vectors and  recalculate the  tf-idf
> final  to minimize  errors or desviations   in a  subsequent clustering
> from this tf-idf final mix vectors.

Mathematically it doesn't matter whether you scale the vectors at
generation time or before computing distance or by scaling during the
distance computation.

Different places for the change may be more or less easy in terms of
programming.  The two easiest places tend to be at the beginning (if you
know the weights) since you have to write that code anyway, or at the end
since there are provisions for changing the metric in some programs.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message