mahout-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From p..@apache.org
Subject svn commit: r1665101 - /mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
Date Mon, 09 Mar 2015 00:19:51 GMT
Author: pat
Date: Mon Mar  9 00:19:50 2015
New Revision: 1665101

URL: http://svn.apache.org/r1665101
Log:
fixed some wording

Modified:
    mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext

Modified: mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
URL: http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext?rev=1665101&r1=1665100&r2=1665101&view=diff
==============================================================================
--- mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
(original)
+++ mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
Mon Mar  9 00:19:50 2015
@@ -1,14 +1,8 @@
 #Intro to Cooccurrence Recommenders with Spark
 
-Mahout's next generation recommender is based on the proven cooccurrence algorithm but takes
it several important steps further
-by creating a multimodal recommender, which can make use of many user actions to make recommendations.
In the old days 
-only page reads, or purchases could be used alone. Now search terms, locations, all manner
of clickstream data can be used to 
-recommend - hence the term multimodal. It also allows the recommendations to be tuned for
the placement context by changine 
-the query without recalculating the model - adding to its multimodality.
-
 Mahout provides several important building blocks for creating recommendations using Spark.
*spark-itemsimilarity* can 
 be used to create "other people also liked these things" type recommendations and paired
with a search engine can 
-personalize multimodal recommendations for individual users. *spark-rowsimilarity* can provide
non-personalized content based 
+personalize recommendations for individual users. *spark-rowsimilarity* can provide non-personalized
content based 
 recommendations and when paired with a search engine can be used to personalize content based
recommendations.
 
 ![image](http://s6.postimg.org/r0m8bpjw1/recommender_architecture.png)
@@ -22,11 +16,10 @@ User history is used as a query on the i
 ##References
 
 1. A free ebook, which talks about the general idea: [Practical Machine Learning](https://www.mapr.com/practical-machine-learning)
-2. A slide deck, which talks about mixing user actions and other indicators: [Multimodal
Streaming Recommender](http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/)
+2. A slide deck, which talks about mixing actions or other indicators: [Creating a Unified
Recommender](http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/)
 3. Two blog posts: [What's New in Recommenders: part #1](http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/)
 and  [What's New in Recommenders: part #2](http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/)
-4. A post describing the loglikelihood ratio:  [Surprise and Coinsidense](http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html)
 LLR is used to reduce noise in the data while keeping the calculations O(n) complexity.
-5. A demo [Video Guide][1] site, which uses many of the techniques described above.
+3. A post describing the loglikelihood ratio:  [Surprise and Coinsidense](http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html)
 LLR is used to reduce noise in the data while keeping the calculations O(n) complexity.
 
 Below are the command line jobs but the drivers and associated code can also be customized
and accessed from the Scala APIs.
 
@@ -320,11 +313,11 @@ the only similarity method supported thi
 LLR is used more as a quality filter than as a similarity measure. However *spark-rowsimilarity*
will produce 
 lists of similar docs for every doc if input is docs with lists of terms. The Apache [Lucene](http://lucene.apache.org)
project provides several methods of [analyzing and tokenizing](http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/analysis/package-summary.html#package_description)
documents.
 
-#<a name="unified-recommender">4. Creating a Unified Recommender</a>
+#<a name="unified-recommender">4. Creating a Multimodal Recommender</a>
 
-Using the output of *spark-itemsimilarity* and *spark-rowsimilarity* you can build a unified
cooccurrence and content based
+Using the output of *spark-itemsimilarity* and *spark-rowsimilarity* you can build a miltimodal
cooccurrence and content based
  recommender that can be used in both or either mode depending on indicators available and
the history available at 
-runtime for a user.
+runtime for a user. Some slide describing this method can be found [here](http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/)
 
 ##Requirements
 
@@ -381,6 +374,8 @@ items with the most similar tags. Notice
 content or metadata indicator. They are used when you want to find items that are similar
to other items by using their 
 content or metadata, not by which users interacted with them.
 
+**Note**: It may be advisable to treat tags as cross-cooccurrence indicators but for the
sake of an example they are treated here as content only.
+
 For this we need input of the form:
 
     itemID<tab>list-of-tags
@@ -408,10 +403,9 @@ This is a content indicator since it has
     
 We now have three indicators, two collaborative filtering type and one content type.
 
-##Unified Recommender Query
+##Multimodal Recommender Query
 
-The actual form of the query for recommendations will vary depending on your search engine
but the intent is the same. 
-For a given user, map their history of an action or content to the correct indicator field
and perform an OR'd query. 
+The actual form of the query for recommendations will vary depending on your search engine
but the intent is the same. For a given user, map their history of an action or content to
the correct indicator field and perform an OR'd query. 
 
 We have 3 indicators, these are indexed by the search engine into 3 fields, we'll call them
"purchase", "view", and "tags". 
 We take the user's history that corresponds to each indicator and create a query of the form:
@@ -443,6 +437,3 @@ This will return recommendations favorin
 2. Content can be used where there is no recorded user behavior or when items change too
quickly to get much interaction history. They can be used alone or mixed with other indicators.
 3. Most search engines support "boost" factors so you can favor one or more indicators. In
the example query, if you want tags to only have a small effect you could boost the CF indicators.
 4. In the examples we have used space delimited strings for lists of IDs in indicators and
in queries. It may be better to use arrays of strings if your storage system and search engine
support them. For instance Solr allows multi-valued fields, which correspond to arrays.
-
-
-  [1]: https://guide.finderbots.com
\ No newline at end of file



Mime
View raw message