mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: How to improve clustering?
Date Wed, 27 Mar 2013 04:51:00 GMT
Uh...

Shouldn't your be doing the IDF weighting *before* you normalize the vector
length?

On Tue, Mar 26, 2013 at 5:44 PM, Sebastian Briesemeister <
sebastian.briesemeister@unister-gmbh.de> wrote:

> ...
> For each document, I set a field in the corresponding vector to 1 if it
> contains a word. Then I normalize each vector using the L2-norm.
> Finally I multiply each element (representing a word) in the vector by
> log(#documents/#documents_with_word).
>
> For clustering, I am using cosine similarity.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message