Hi All I'm tunning the cluster number of some news input with CosineDistanceMeasure, the
input data is about 11000 rows, so i tried different settings for t1 and t2, here is a list:
1) with t1: 0.6 t2: 0.9, i got Reduce output records=60 2) with t1: 0.6 t2: 0.8, i got Reduce
output records=868 3) with t1=0.6 and t2=0.7, i got Reduce output records=3374 I expect the
reduce output (the cluster number) should be less than 100 and the first one just matched
what i was thinking, but what supprised me is the test values for t2, so my understanding
is that cos(25) is about 0.9 and cos(35) is about 0.8 (cos(90) == 0.0), so if i set cos(35)
as t2, it should generate less cluster number than cos(25) as t2, because it means two vector
is much more different, the angle between them is larger. Did I miss something? Thanks in
advance. Cheers Ramon
