mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Taste in production?
Date Fri, 06 Mar 2009 19:31:04 GMT
I am actually not sure. Over the years I've heard from plenty of users
but have been less than clear about where it has ended up used.
Dumbhippo used it, a long while ago. One other dev here is involved
with setting up for a reasonably big installation though I don't know
that he can share details.

My party line on this is I don't know of any reason it wouldn't be
suitable for use in production -- other than the perennial issue of
scale. In its non-distributed form I think it's good for sites with a
data set of up to about 10M preferences (after you prune out the noise
and questionable data.) I think it's been pretty well debugged at this
point and the design won't change much, having overhauled it in
Mahout.

To give a data point, I've been running a slope one recommender
against the GroupLens 10-million-movie-rating data set on a big (7GB,
4 CPU) instance on Amazon EC2, and can generate recommendations in
about 300ms per user. Still reasonable for real-time; you could push
this to larger data sets if you're running the recommendation
computation offline in batch.

(And on that point, I have an Amazon EC2 EMI that will take care of that part.)

For much bigger data sets, you'd need to bring in something like
Hadoop. We've got bindings for Hadoop to distribute the computation,
though that part isn't battle-tested. In fact, so far the recommender
algorithms themselves aren't parallelized (the existing ones can't
really be) so what it's really doing is just running many recommenders
at once to scale, which still means the size of your box limits how
much data you can get into memory and so forth.


... and here I'm talking about just using the stock algorithms as-is.
I think the best (fastest, most accurate) recommender system for a
given domain will probably need to take advantage of the domain's
features. So, it's  possible to improve the above a lot if you can
make more assumptions about the data.

This is where I can dig in a bit with you to assess what you need,
what you can do, and whether it roughly looks like it'll work.


On Fri, Mar 6, 2009 at 6:12 PM, Matthew Runo <mruno@zappos.com> wrote:
> Hello folks!
>
> There's been some buzz around the office about recommendation engines
> lately, and knowing what I know about Taste I thought I'd send in a request
> to see it "in the wild". Are there any largish sites (or small ones) working
> with recommendations from Taste / Mahout that I can look at?
>
> Thanks for your time!
>
> Matthew
>
>
>

Mime
View raw message