mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bae, Jae Hyeon" <metac...@gmail.com>
Subject Re: Clustering sparse data
Date Wed, 19 Oct 2011 14:47:41 GMT
I am sorry, I am confused about distance and similarity. Distance between
pairs is mostly 1 with CosineDistanceMeasure.

2011/10/19 Ted Dunning <ted.dunning@gmail.com>

> Distance between pairs is mostly zero?  This indicates a real problem. It
> the pairs that you mean are pairs of examples it isn't so bad but pairs of
> canopies should have non zero distance.
>
> Or did you mean pairs of coordinates?
>
> Sent from my iPhone
>
> On Oct 19, 2011, at 8:36, "Bae, Jae Hyeon" <metacret@gmail.com> wrote:
>
> > Hi
> >
> > I am trying to do clustering very sparse data. With canopy clustering, it
> > generates so many canopies causing GC overhead limit. I can change
> > parameters of canopy clustering but distances between most pairs are 0,
> > changing parameters does not affect so much. Even if I increase -Xmx
> size, a
> > lot of canopies will drive single reducer of canopy clustering to the GC
> > overhead limit.
> >
> > Could you suggest any better idea for this situation? I can try K-means
> > clustering with K as a big number and Locality Sensitive Hashing can be a
> > good candidate but I am not sure Likelike implementation is robust and
> > flexible to use.
> >
> > Thank you
> >
> > Best, Jae
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message