mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Serega Sheypak <serega.shey...@gmail.com>
Subject Re: mapreduce ItemSimilarity input optimization
Date Thu, 21 Aug 2014 08:22:27 GMT
>>What you are doing is best practice for showing similar “views”. The
technique for using multiple actions will be covered in a series of blogs
posts and may be put on the Mahout site eventually
Great thanks!

>>People look at 100 things and buy 1, as you say. The question is: Do you
want people to buy something or just browse your site?
No objections for your point. I understand it. It should work for pretty
big ecom, right? Small ecom sell 100-200 items per day and have wide range
of items.

>>Filter out any items not in the catalog from your recommendations.
We have it on data preparation stage. We recalculate item similarity each
day sliding back for 60 days excluding non-available items on preparation
stage.

Thank you! We did reach good results, business guys got satisfaction :)


2014-08-20 20:28 GMT+04:00 Pat Ferrel <pat@occamsmachete.com>:

> >
> > On Aug 19, 2014, at 11:26 PM, Serega Sheypak <serega.sheypak@gmail.com>
> wrote:
> >
> > Hi!
> > 1. There was a bug in UI, I've checked raw recommendations. "water
> heating
> > device" has low score. So first 30 recommended items really fits iPhone,
> > next are not so good. Anyway result is good, thank you very much.
> > 2. I've inspected "sessions" of users, really there are people who viewed
> > iphone and heating device. 10 people for last month.
> > 3. I will calculate relative measurment, I didn't calc what is % of these
> > people comparing to others and how they fluence on score result.
> >
>
> That’s great. The Spark version sorts the result by weights, but I think
> the mapreduce version doesn't
>
> > *You wrote:*
> > Then once you have that working you can add more actions but only with
> > cross-cooccurrence, adding by weighting* will not work with this type of
> > recommender*, which recommender can work with weights for actions?
>
> What you are doing is best practice for showing similar “views”. The
> technique for using multiple actions will be covered in a series of blogs
> posts and may be put on the Mahout site eventually. It requires
> spark-itemsimilarity. For now I’d strongly suggest you look at training on
> purchase data alone - see the comments below.
>
> >
> > *About building recommendations using sales.*
> > Sales are less than 1% from item views. You will recommend only stuff
> > people buy.
>
> The point is not volume of data but quality of data. I once measured how
> predictive of purchases the views were and found them a rather poor
> predictor. People look at 100 things and buy 1, as you say. The question
> is: Do you want people to buy something or just browse your site?
>
> On the other hand you would need to see how good your coverage is of
> purchases. Do you have enough items purchased by several people (Ted’s
> questions below will guide you)? If there is good coverage then you _do
> not_ restrict the range by using only purchase data. You increase the
> quality.
>
> > If you recommend what people see you significantly widen range
> > of possible buy actions. People always buy case "XXX" with iphone. You
> > would never recommened them to buy case "YYY". If people watch "XXX" and
> > "YYY" it's reasonable to recommened "YYY". Maybe "YYY" it's more
> expensive
> > that is why people prefer cheaper "XXX". What's wrong with this
> assumption?
>
> Nothing at all. Remember that your goal is to cause a purchase but using
> views requires some “scrubbing” of views. You want, in effect,
> views-that-lead-to-purchases. In a cooccurrence recommender this can be
> done with cross-cooccurrence and I’ll describe that elsewhere, it’s too
> long for an email to describe but pretty easy to use.
>
> I’d wager that if you restrict to purchases your sales will go up over
> recommending views. But that is without looking at your data. If you need
> more data try increase the sliding time window to add more purchases. This
> will eventually start including things that are no longer in your catalog
> so will have diminishing returns but 60 days seem like a short time period.
> Filter out any items not in the catalog from your recommendations.
>
> You want recency to matter, this is good intuition. The in-catalog filter
> is one simple way, and there are others when you get to personalization.
>
> >
> > *About our obsessive desire to add weights for actions.*
> > We would like to self-tune our recommendations. If user clicks our
> > recommendation it's a signal for us that items are related. So next time
> > this link should have higher score. What are the approaches to do it?
> >
>
> Yes, you do want the things that lead to purchases to go into the training
> data. This is good intuition. But you don’t do it with weights you train on
> new purchases, regardless of whether they came from random views,
> rec-views, or … You don’t care whether a rec was clicked on; you care if a
> purchase was made and you don’t care what part of the UI caused it. UI
> analysis is very very important but doesn’t help the recommender, it guides
> UI decisions. So measuring clicks is good but shouldn’t be used to change
> recs.
>
> One way to increase the value of your recs is to add a little randomness
> to their ordering. If you have 10 things to recommend get 20 from
> itemsimilarity and apply a normally distributed random weighting, then
> re-sort and show the top 10. This will move some things up in order and
> show them where without the re-ordering they would never be shown. The
> technique allows you to expose more items to possible purchase and
> therefore affect the ordering the next time you train. The actual algorithm
> takes more space to describe but the idea is a lot like a multi-armed
> bandit where the best bandit eventually gets all trials. In this case the
> best rec leads to a purchase and gets into the new training data and so
> will be shown more often the next time.
>
> Another thing you can do is create a “shopping cart” recommender. This
> looks at items purchased together—an item-set. It is a strong indicator of
> relatedness.
>
> Suggestions:
> 1) personalize: this is likely to make the most difference since you will
> be showing different things to different people. The “Practical Machine
> Learning” is short and easy to read—it describes this.
> 2) move to purchase data training, wait for cross-cooccurrence to add in
> view data. Do this if you have good coverage (Ted’s questions below relate
> to this).
> 3) increase the training period if needed to get good catalog coverage
> 4) consider dithering your recs to expose more items to purchase and
> therefore self-tune by increasing the quality of your training data.
>
> >
> >
> >
> > 2014-08-20 7:18 GMT+04:00 Ted Dunning <ted.dunning@gmail.com>:
> >
> >> On Tue, Aug 19, 2014 at 12:53 AM, Serega Sheypak <
> serega.sheypak@gmail.com
> >>>
> >> wrote:
> >>
> >>> What could be a reason for recommending "Water heat device " to iPhone?
> >>> iPhone is one of the most popular item. There should be a lot of people
> >>> viewing iPhone with "Water heat device "?
> >>>
> >>
> >> What are the numbers?
> >>
> >> How many people got each item?  How many people total?  How many people
> got
> >> both?
> >>
> >> What about the same for the iPhone related items?
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message