Yes, Richard is right. I used the arc of the value and it solved the
mismatch.
Math.acos(value) which would range from 0 to π / 2.
"...π / 2 meaning independent, 0 meaning exactly the same, with inbetween
values indicating intermediate similarities or dissimilarities...."
wiki<http://en.wikipedia.org/w/index.php?title=Jaccard_index§ion=2#Tanimoto_coefficient_.28extended_Jaccard_coefficient.29>
I think since Tanimoto distance is more suited for only binary values, (but
with TFIDF we have other values than 0s and 1s).
Pearson correlations as Sean has suggested works for cosine distance if, the
data are 'centered' (have a mean of 0). But I think as Richard said (in
TFIDF vectors we aren't going to get any negative values), we can't have
mean of 0.
Regards,
Dipesh
>
> 2008/12/6 Sean Owen <srowen@gmail.com>
>
> > To answer a few recent points:
> >
> > Not sure if this is helpful, but, the collaborative filtering part of
> > Mahout contains an implementation of cosine distance measure  sort
> > of. Really it has an implementation of the Pearson correlation, which
> > is equivalent, if the data are 'centered' (have a mean of 0). This is,
> > in my opinion, a good idea. So if you agree, you could copy and adapt
> > this implementation of Pearson to your purpose. It is pretty easy to
> > recreate the actual cosine distance measure correlation too from this
> > code  I used to have it separately in the code.
> >
> > The Tanimoto distance is a ratio of intersection to union of two sets,
> > so is between 0 and 1. Cosine distance is, essentially, the cosine of
> > an angle in featurespace, so is between 1 and 1.
> >
>


"Help Ever Hurt Never" Baba
