mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From WangRamon <>
Subject What will be a better value for T1 and T2 of a CosineDistanceMeasure
Date Thu, 15 Mar 2012 05:25:38 GMT

Hi All  I'm tunning the cluster number of some news input with CosineDistanceMeasure, the
input data is about 11000 rows, so i tried different settings for t1 and t2, here is a list:
1) with t1: 0.6 t2: 0.9, i got Reduce output records=60 2) with t1: 0.6 t2: 0.8, i got Reduce
output records=868 3) with t1=0.6 and t2=0.7, i got Reduce output records=3374  I expect the
reduce output (the cluster number) should be less than 100 and the first one just matched
what i was thinking, but what supprised me is the test values for t2, so my understanding
is that cos(25) is about 0.9 and cos(35) is about 0.8 (cos(90) == 0.0), so if i set cos(35)
as t2, it should generate less cluster number than cos(25) as t2, because it means two vector
is much more different, the angle between them is larger. Did I miss something? Thanks in
advance.  Cheers   Ramon  		 	   		  
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message