mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From djn <dere...@gmail.com>
Subject ItemSimilarityJob Cooccurrence Question
Date Sat, 04 Jun 2011 21:21:53 GMT
Regarding ItemSimilarityJob, it is my understanding that if there are two
input lines of the form &lt;user1, product1&gt; and &lt;user1, product2&gt;,
then that would constitute a co-occurrence between product1 and product2.

I've generated a large test dataset under this assumption, and it guarantees
that there will only be co-occurrences between pairs of product IDs that
I've predefined. I'm not using preference values and I'm setting
--booleanData true.

While the ItemSimilarityJob's output does include these predefined
co-occurrences, it also outputs a large number of co-occurrences (with small
co-occurrence counts) between products that are not co-occurring in the
input dataset. Can anyone provide some insight as to why this might be
happening?

--
View this message in context: http://lucene.472066.n3.nabble.com/ItemSimilarityJob-Cooccurrence-Question-tp3024516p3024516.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Mime
View raw message