predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: Similar product template
Date Thu, 13 Apr 2017 15:30:41 GMT
I’m surprised that ALS seemed clear because is is based on a complicated matrix factorization
algorithm that transforms the user vectors into a smaller dimensional space that is composed
of “important” features. These are not interactions with items like “buys”, they can
only be described as defining a new feature space. The factorized matrices transform in and
out of that space. The factorized matrices are approximations of user x features, and features
x items.

The user’s history is transformed into the feature space, which will be dense, in other
words indicating some preference for all features. Then when this dense user vector is transformed
back into item space the approximation nature of ALS will give some preference value for all
items. At this point they can be ranked by score and the top few returned. This is clearly
wrong since user will never have a preference for all items and would never purchase or convert
on a large number of them no mater what the circumstances. It does give good results for the
top ranked though when you have lots of “conversions” per user on average because ALS
can only use conversions as input. in other words it can use only one kind of behavior data.

The CCO (Correlated Cross-Occurrence) algorithm from Mahout that is behind the Universal Recommender
is multi-domain and multi-modal, in that takes interactions of the user from many actions
they perform and even contextual data like profile info or location. It takes all this and
finds which “indicators”, a name for these interactions or other user info, and compares
them with the user’s conversions. It does this for all users and so finds which of the indicators
most often lead to conversion. These highly correlated indicators are then associated with
items as properties, When a user recommendation is needed we see which items have the most
similar behavioral indicators as the user's history. This tells us that the user probably
has an affinity for the item—we can predict a preference for these items.

The differences:
1) ALS can ingest only one type of behavior. This is not bad but also not very flexible and
requires a good number of these interactions per user.
2) Cross-behavioral recommendations cannot be made with ALS since no cross behavioral data
is seen by it. This in turn means that users with few or no conversions will not get recommendations.
The Universal Recommender can make recommendations to users with no conversions if they have
other behavior to draw from so it is generally said to handle cool-start for user’s better.
Another way to say this is that “cold-start” for ALS is only “cool-start” for CCO
(in the UR). The same goes for item-based recommendations.
3) CCO can also use content directly for similar item recommendations, which helps solve the
item “cold-start” problem. ALS cannot.
4) CCO is more like a landscape of Predictive AI algorithms using all we know about a user
from multiple domains (conversions, page views, search terms, category preferences, tag preferences,
brand preferences, location, device used, etc) to make predictions in some specific domain.
It can also work with conversions alone
5) To do queries with ALS in the MLlib requires that the factorized matrices be in-memory.
They are much smaller than the input but this means running Spark to make queries. This makes
it rather heavy-weight for queries and makes scaling a bit of a problem and fairly complicated
(too much to explain here). CCO on the other hand uses Spark only to create the indicators
model, which it puts in Elasticsearch. Elasticsearch finds the top ranked items compared to
the user’s history at runtime in real-time.  This makes scaling queries as easy as scaling
Elasticsearch since it was meant to scale.

I have done cross-validaton comparisons but they are a bit unfair and the winner depends on
the dataset, In real-life CCO serves more users than ALS since it uses more behavior and so
tends to win for this reason. It’s nearly impossible to compare this with cross-validation
so A/B tests are our only metric.

We have a slide deck showing some of these comparisons here: https://docs.google.com/presentation/d/1HpHZZiRmHpMKtu86rOKBJ70cd58VyTOUM1a8OmKSMTo/edit?usp=sharing


On Apr 13, 2017, at 2:39 AM, Dennis Honders <dennishonders@gmail.com> wrote:

Hello, 

I was using the similar product template. (I'm not a data scientist)
The template is using the ALS algorithm and the Cooccurrence algortihm. 

The ALS algorithm is quite good described on the Apache Spark MLlib website. The Apache Mahout
documentation about the cooccurrence algorithm is quite general described and it is not clear
what the differences are between these algorithms. They both use matrixes to describe relations
but use a different approach to factorize the matrices?

I also like to know a bit more about the parameters of both algorithms, in the engine.json.
What could be the impact of changing the values?
ALS: rank, nIterations, lambda and seed. 
Cooccurrence: "n" 
The algorithms bring different results. Is there a general way of comparing these results?


Greetings,

Dennis


Mime
View raw message