mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: irregular kmeans clusters on binary data
Date Fri, 13 Jul 2012 19:13:54 GMT
On Fri, Jul 13, 2012 at 12:09 PM, Masoud Moshref Javadi <>wrote:

> First of all thank you for your response with pictures.
> That's true. Some features are 1 in many points and some are not. That's
> the nature of my problem. But I did not scale features.
> Should I do scaling? may be using a dimension reduction algorithm?

Can you say more about your data?  Can you provide the output of something
like the summary function from R?

Dimensionality reduction will be a disaster if you have badly scaled data.
 Dimensionality reduction preserves L_2 distances.  If those distances are
already messed up, then it will preserve the mess.  You don't want that.
 Get the metric right first.  Then let's talk.

Also, if you have 1 of n features, you should almost certainly encode them
as n binary values, not as a single variable with n values.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message