mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Taste speed
Date Tue, 24 Nov 2009 20:04:47 GMT
As another data point, at Veoh we analyzed 7 months of data at 100-250
million events per day.  This involved a few tens of millions of users and a
few million items.  We down-sampled the most common items so that no item
had more than 1000 users.

Offline analysis took 10 hours on about 10-20 cores (this was Hadoop 15 and
then 16).  Recommendations were done by various technologies at different
times, but we typically had to produce 100-800 recommendations per second.
Initially, we used a combined Solr instance for all search, navigation, site
structuring and recommendation, but it quickly became clear that it was
better to specialize.  Later we used a specially designed static web site to
serve the item vectors and combined them on light-weight servers or even in
the browser.  That let us have good cacheability and enormous scalability.

The actual algorithms were to weight item vectors according to IDF of the
item in the history, but otherwise just used a fixed score vs rank in the
item vectors themselves.  This worked as well as any fancier solution we
tried.  Item vectors were directly from the cooccurrence matrix, sparsified
using LLR.

On Tue, Nov 24, 2009 at 11:46 AM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> Hi,
>
> > Yes, that's quite small.  As a reference. I'm currently writing up a
> > case study on a data set with 130K users and 160K items and
> > recommendation time is from 10ms to 200ms, depending on the algorithm.
> > Your use case seems to require 200-400ms per recommendation -- on a
> > 1-core machine.
>
> Yeah.  That sounds really off to me.
>
> > I'd recommender the -d64 flag if your system is 64-bit, but that's
> marginal.
>
> Can't do it on my servers. :(
>
> > I think the big slowdown is probably the translation from strings to
> > longs and back.
> >
> > What's your DataModel like? getting good performance in the case where
> > you need to do translation can be tricky and I suspect this is the
> > issue.
>
> My data looks like this (user,item), if that's what you're asking:
>
> 111.111.111.111-1629385632.30042258,DHHDE59E0Q920007715
> 222.222.222.222-1251641952.30039838,KDJDE5AJ31I20003422
> 333.333.333.333-1193732240.30032560,AKNDKDKJDJD320079784
>
> I believe the string->long conversion is basically the same as what you
> committed a few months back.
>
> Otis
>
> > On Tue, Nov 24, 2009 at 7:10 PM, Otis Gospodnetic
> > wrote:
> > > Hello,
> > >
> > > I've been using Taste for a while, but it's not scaling well, and I
> suspect
> > I'm doing something wrong.
> > > When I say "not scaling well", this is what I mean:
> > > * I have 1 week's worth of data (user,item datapoints)
> > > * I don't have item preferences, so I'm using the boolean model
> > > * I have caching in front of Taste, so the rate of requests that Taste
> needs
> > to handle is only 150-300 reqs/minute/server
> > > * The server is an 8-core 2.5GHz 32-bit machine with 32 GB of RAM
> > > * I use 2GB heap (-server -Xms2000M -Xmx2000M -XX:+AggressiveHeap
> > -XX:MaxPermSize=128M -XX:+CMSClassUnloadingEnabled
> > -XX:+CMSPermGenSweepingEnabled) and Java 1.5 (upgrade scheduled for
> Spring)
> > >
> > > ** The bottom line is that with all of the above, I have to filter out
> less
> > popular items and less active users in order to be able to return
> > recommendations in a reasonable amount of time (e.g. 100-200 ms at the
> 150-300
> > reqs/min rate).  In the end, after this filtering, I end up with, say,
> 30K users
> > and 50K items, and that's what I use to build the DataModel.  If I remove
> > filtering and let more data in, the performance goes down the drain.
> > >
> > > My feeling is 30K users and 50K items makes for an awfully small data
> set and
> > that Taste, esp. at only
> > > 150-300 reqs/min on an 8-core server should be much faster.  I have a
> feeling
> > I'm doing something wrong and that Taste is really capable of handling
> more
> > data, faster.  Here is the code I use to construct the recommender:
> > >
> > >    idMigrator = LocalMemoryIDMigrator.getInstance();
> > >    model = MyDataModel.getInstance("itemType");
> > >
> > >    // ItemSimilarity similarity = new LogLikelihoodSimilarity(model);
> > >    similarity = new TanimotoCoefficientSimilarity(model);
> > >    similarity = new CachingUserSimilarity(similarity, model);
> > >
> > >    // hood size is 50, minSimilarity is 0.1, samplingRate is 1.0
> > >    hood = new NearestNUserNeighborhood(hoodSize,
> minSimilarity,similarity,
> > model, samplingRate);
> > >
> > >    recommender = new GenericUserBasedRecommender(model, hood,
> similarity);
> > >    recommender = new CachingRecommender(recommender);
> > >
> > > What do you think of the above numbers?
> > >
> > > Thanks,
> > > Otis
> > >
>
>


-- 
Ted Dunning, CTO
DeepDyve

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message