mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: Dirichlet process clustering
Date Fri, 08 Feb 2013 18:18:19 GMT
What kind of data are you clustering?
Which model distribution are you using?
How many iterations are you running?
How do the cluster n= values change as you increase the number of 
iterations?



On 2/7/13 11:35 AM, Aysu Ezen wrote:
> Hello,
>
> I am having difficulty with Dirichlet process clustering, I would highly
> appreciate any help.
> The results of Dirichlet clustering with my data groups all instances in
> one single cluster no matter how many iterations I have tried.
> The clusterdump output is like:
> DC-0 total= 1152000 model= GC:0{*n=1152* c=[0:0.014, 1:0.004, 2:0.001,
> 3:0.005, 5:0.004
> ...
> DC-1 total= 0 model= GC:1{*n=0* c=[0.085, 0.101, 1.617, -1.592, 0.721,
> -1.618, 0.550, 0.302
> ...
>
> I thought the problem could have been about the way input is read however
> when I tried reuters dataset, its output was also similar:
> DC-0 total= 320 model= GC:0{*n=32* c=[2.886, 0.210, 0.167, 0.210, 0.664,
> 0.254, 0.486,
> ...
> DC-1 total= 0 model= GC:1{*n=0* c=[-0.217, -0.522, 1.138, 0.399, -0.314,
> 1.063, -0.967,
> When I use the dictionary for the reuters dataset, it prints reasonable
> words for the clusters like:
> :DC-0 total
> Top Terms:
> d                                       =>   48.25068240612745
> 5                                       =>   45.90837124735117
> said                                    =>   44.70690381526947
> topics                                  =>   44.07638777047396
> 22                                      =>   39.78152487426996
> companies                               =>   38.85674291104078
> date                                    =>   38.47198750451207
> unknown                                 =>   38.33379830792546
> reuters                                 =>   37.93209125474095
> title                                   =>   37.45820361748338
> :DC-1 total
> Top Terms:
> foreclosed                              =>   3.973533371410058
> 18749                                   =>   3.945486656800688
> jannock                                 =>  3.8038475335990882
> 48.29                                   =>  3.7140637347393706
> asphalt                                 =>  3.6475071525946103
> fragile                                 =>  3.6402008090541895
> compiled                                =>   3.584675891358228
> 642                                     =>  3.5606986939331313
> 6.73                                    =>  3.5492208849250027
> 16334                                   =>  3.5394655632624428
>
>
> Is there anybody who knows about the cause of this problem?
>
> Thanks
>


Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message