mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Getting Taste to work on 10M dataset
Date Sat, 30 Jan 2010 02:40:26 GMT
On Sat, Jan 30, 2010 at 2:34 AM, Vinicius Carvalho
<viniciusccarvalho@gmail.com> wrote:
> I'm trying the 5.1.10 the latest one available at maven repositories,
> running it right now, since it takes a while, I'll inform of the results

OK but this would be something you can check in your table right now.
No columns should be nullable, or have nulls. If they do, that's the
problem.


> At first I'm just creating the slopeonerecommender. did not even get to the
> actual code, all that time is used on the construction of the object

OK then it's the time spent in building diffs.


> You mean for the DiffStorage right? The datamodel would be good to be at
> JDBC right? I'm interested in item2item recommendations. I did this before

For both, 10M ratings isn't terribly big. I think you can get it into
memory in 2GB, plus the diffs, if you cap the number of diffs at some
reasonable value.

> using taste by hand by computing the SimilarityMatrix and storing it on DB.
> (I used as reference the book Collective Intelligence in action) and it
> worked fine. Just the Similarity Matrix took a while to be recalculated by
> it was a batch job running every hour. After that computing recomendations
> was just a breeze.

You mean you are interested in item-based recommenders, or
recommending items to other items?

Slope-one wouldn't have anything to do with item-item similarities, it
works a bit differently. yes you could pre-compute similarities and
use them with a custom ItemSimilarity implementation which reads from
a DB table, and use that with GenericItemBasedRecommender.

You could also do the similarity calculations with something like
PearsonCorrelationSimilarity, and store that in the DB, and proceed
with the above. Again, you'd have to write a little code but pretty
easy.

Or you could skip the DB altogether and let it compute item-item
similarities on the fly.

Mime
View raw message