mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Devins ...@joshdevins.com>
Subject Re: Top-N recommendations from SVD
Date Wed, 06 Mar 2013 10:54:16 GMT
The factorization at 2-hours is kind of a non-issue (certainly fast
enough). It was run with (if I recall correctly) 30 reducers across a 35
node cluster, with 10 iterations.

I was a bit shocked at how long the recommendation step took and will throw
some timing debug in to see where the problem lies exactly. There were no
other jobs running on the cluster during these attempts, but it's certainly
possible that something is swapping or the like. I'll be looking more
closely today before I start to consider other options for calculating the
recommendations.



On 6 March 2013 11:41, Sean Owen <srowen@gmail.com> wrote:

> Yeah that's right, he said 20 features, oops. And yes he says he's talking
> about the recs only too, so that's not right either. That seems way too
> long relative to factorization. And the factorization seems quite fast; how
> many machines, and how many iterations?
>
> I thought the shape of the computation was to cache B' (yes whose columns
> are B rows) and multiply against the rows of A. There again probably wrong
> given the latest timing info.
>
>
> On Wed, Mar 6, 2013 at 10:25 AM, Josh Devins <hi@joshdevins.com> wrote:
>
> > So the 80 hour estimate is _only_ for the U*M', top-n calculation and not
> > the factorization. Factorization is on the order of 2-hours. For the
> > interested, here's the pertinent code from the ALS `RecommenderJob`:
> >
> >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.7/org/apache/mahout/cf/taste/hadoop/als/RecommenderJob.java?av=f#148
> >
> > I'm sure this can be optimised, but by an order of magnitude? Something
> to
> > try out, I'll report back if I find anything concrete.
> >
> >
> >
> > On 6 March 2013 11:13, Ted Dunning <ted.dunning@gmail.com> wrote:
> >
> > > Well, it would definitely not be the for time I counted incorrectly.
> > >  Anytime I do arithmetic the result should be considered suspect.  I do
> > > think my numbers are correct, but then again, I always do.
> > >
> > > But the OP did say 20 dimensions which gives me back 5x.
> > >
> > > Inclusion of learning time is a good suspect.  In the other side of the
> > > ledger, if the multiply is doing any column wise access it is a likely
> > > performance bug.  The computation is AB'. Perhaps you refer to rows of
> B
> > > which are the columns of B'.
> > >
> > > Sent from my sleepy thumbs set to typing on my iPhone.
> > >
> > > On Mar 6, 2013, at 4:16 AM, Sean Owen <srowen@gmail.com> wrote:
> > >
> > > > If there are 100 features, it's more like 2.6M * 2.8M * 100 = 728
> > Tflops
> > > --
> > > > I think you're missing an "M", and the features by an order of
> > magnitude.
> > > > That's still 1 day on an 8-core machine by this rule of thumb.
> > > >
> > > > The 80 hours is the model building time too (right?), not the time to
> > > > multiply U*M'. This is dominated by iterations when building from
> > > scratch,
> > > > and I expect took 75% of that 80 hours. So if the multiply was 20
> hours
> > > --
> > > > on 10 machines -- on Hadoop, then that's still slow but not out of
> the
> > > > question for Hadoop, given it's usually a 3-6x slowdown over a
> parallel
> > > > in-core implementation.
> > > >
> > > > I'm pretty sure what exists in Mahout here can be optimized further
> at
> > > the
> > > > Hadoop level; I don't know that it's doing the multiply badly though.
> > In
> > > > fact I'm pretty sure it's caching cols in memory, which is a bit of
> > > > 'cheating' to speed up by taking a lot of memory.
> > > >
> > > >
> > > > On Wed, Mar 6, 2013 at 3:47 AM, Ted Dunning <ted.dunning@gmail.com>
> > > wrote:
> > > >
> > > >> Hmm... each users recommendations seems to be about 2.8 x 20M Flops
> =
> > > 60M
> > > >> Flops.  You should get about a Gflop per core in Java so this should
> > > about
> > > >> 60 ms.  You can make this faster with more cores or by using ATLAS.
> > > >>
> > > >> Are you expecting 3 million unique people every 80 hours?  If no,
> then
> > > it
> > > >> is probably more efficient to compute the recommendations on the
> fly.
> > > >>
> > > >> How many recommendations per second are you expecting?  If you have
> 1
> > > >> million uniques per day (just for grins) and we assume 20,000 s/day
> to
> > > >> allow for peak loading, you have to do 50 queries per second peak.
> >  This
> > > >> seems to require 3 cores.  Use 16 to be safe.
> > > >>
> > > >> Regarding the 80 hours, 3 million x 60ms = 180,000 seconds = 50
> hours.
> > >  I
> > > >> think that your map-reduce is under performing by about a factor of
> > 10.
> > > >> This is quite plausible with bad arrangement of the inner loops. 
I
> > > think
> > > >> that you would have highest performance computing the
> recommendations
> > > for a
> > > >> few thousand items by a few thousand users at a time.  It might be
> > just
> > > >> about as fast to do all items against a few users at a time.  The
> > reason
> > > >> for this is that dense matrix multiply requires c n x k + m x k
> memory
> > > ops,
> > > >> but n x k x m arithmetic ops.  If you can re-use data many times,
> you
> > > can
> > > >> balance memory channel bandwidth against CPU speed.  Typically you
> > need
> > > 20
> > > >> or more re-uses to really make this fly.
> > > >>
> > > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message