On Thu, Jan 15, 2015 at 5:23 AM, Miguel Angel Martin junquera <
mianmarjun.mailinglist@gmail.com> wrote:
> My question is:..
> Is it better to scale up these dimensions directly in the tfidf
> sequence final mix file using this correction factors OR first do scale
> up in each tfvectors and then mix vectors and recalculate the tfidf
> final to minimize errors or desviations in a subsequent clustering
> from this tfidf final mix vectors.
>
Mathematically it doesn't matter whether you scale the vectors at
generation time or before computing distance or by scaling during the
distance computation.
Different places for the change may be more or less easy in terms of
programming. The two easiest places tend to be at the beginning (if you
know the weights) since you have to write that code anyway, or at the end
since there are provisions for changing the metric in some programs.
