mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kate Ericson <eric...@cs.colostate.edu>
Subject Re: Clustering with KMeans
Date Tue, 08 Feb 2011 23:04:59 GMT
Hey Sharath,

I'm sure there's a better way to check the number of clusters, but you
could try looking over the file that you pulled cluster 197 from, and
see if there are more clusters in it.
I'm not very familiar with the canopy program, but you may want to try
smaller t1 and t2 values - maybe your points are too close together so
they're all ending up in one cluster?

-Kate

On Tue, Feb 8, 2011 at 3:52 PM, sharath jagannath
<sharathjagannath@gmail.com> wrote:
> btw it is just canopies generated by CanopyDriver.
>
> On Tue, Feb 8, 2011 at 2:32 PM, Kate Ericson <movingb0x@gmail.com> wrote:
>
>> Hi Sharath,
>>
>> So do you have 197 clusters, or just one cluster where the id is 197?
>> The ids don't always correspond to the number of clusters you have.
>>
>> -Kate
>>
>> On Tue, Feb 8, 2011 at 2:46 PM, sharath jagannath
>> <sharathjagannath@gmail.com> wrote:
>> > Now with t1=800, t2=750, SquaredEuclideanDistanceMeasure, I have 197
>> > clusters:
>> >
>> > C-197{n=1 c=[194:13.118, 346:13.820, 497:13.118, 620:13.118, 1224:11.650]
>> > r=[0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
>> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
>> 0.000,
>> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
>> 0.000,
>> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
>> 0.000,
>> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
>> 0.000,
>> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
>> 0.000,
>> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
>> 0.000,
>> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
>> 0.000,
>> > 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
>> 0.000,
>> > 0.000, 0.000, 0.000, 0.000, 0.000...
>> >
>> >
>> > From the above sample output you can see cluster Id 197, centroids,
>> number
>> > of points and radius.
>> >
>> > For any value of t1 and t2 I always get n = 1. This is quite strange.
>> >
>> > Does it have to do anything with my dataset? Sorry for the confusion
>> > created. All these while I have being saying number of clusters to be 1.
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Sharath
>> >
>>
>
>
>
> --
> Thanks,
> Sharath Jagannath
>

Mime
View raw message