mahout-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sro...@apache.org
Subject svn commit: r803106 - /lucene/mahout/site/src/documentation/content/xdocs/taste.xml
Date Tue, 11 Aug 2009 13:13:35 GMT
Author: srowen
Date: Tue Aug 11 13:13:34 2009
New Revision: 803106

URL: http://svn.apache.org/viewvc?rev=803106&view=rev
Log:
More doc updates following MAHOUT-158

Modified:
    lucene/mahout/site/src/documentation/content/xdocs/taste.xml

Modified: lucene/mahout/site/src/documentation/content/xdocs/taste.xml
URL: http://svn.apache.org/viewvc/lucene/mahout/site/src/documentation/content/xdocs/taste.xml?rev=803106&r1=803105&r2=803106&view=diff
==============================================================================
--- lucene/mahout/site/src/documentation/content/xdocs/taste.xml (original)
+++ lucene/mahout/site/src/documentation/content/xdocs/taste.xml Tue Aug 11 13:13:34 2009
@@ -48,9 +48,7 @@
 
 <p>A <code>Recommender</code> is the core abstraction in Taste. Given a
<code>DataModel</code>, it can produce
 recommendations. Applications will most likely use the <code>GenericUserBasedRecommender</code>
implementation
-or <code>GenericItemBasedRecommender</code>, possibly decorated by
-
-<code>CachingRecommender</code>.</p>
+or <code>GenericItemBasedRecommender</code>, possibly decorated by <code>CachingRecommender</code>.</p>
 
 </section>
 
@@ -61,11 +59,16 @@
 to access preference data from a database via JDBC, though many applications will want to
write their own.
 Taste also provides a <code>FileDataModel</code>.</p>
 
-<p>Along with <code>DataModel</code>, Taste uses the <code>User</code>,
<code>Item</code> and
-<code>Preference</code> abstractions to represent the users, items, and preferences
for those items in the
-recommendation engine. Custom <code>DataModel</code> implementations would return
implementations of these
-interfaces that are appropriate to the application - maybe an <code>OnlineUser</code>
implementation
-that represents an online store user, and a <code>BookItem</code> implementation
representing a book.</p>
+<p>There are no abstractions for a user or item in the object model (not anymore).
Users and items are identified
+solely by an ID value in the framework. Further, this ID value must be numeric; it is a Java
<code>long</code>
+type through the APIs. A <code>Preference</code> object or <code>PreferenceArray</code>
object encapsulates
+the relation between user and preferred items (or items and users preferring them).</p>
+
+<p>Finally, Taste supports, in various ways, a so-called "boolean" data model in which
users do not express
+preferences of varying strengths for items, but simply express an association or none at
all. For example, while 
+users might express a preference from 1 to 5 in the context of a movie recommender site,
there may be no
+notion of a preference value between users and pages in the context of recommending pages
on a web site: there
+is only a notion of an association, or none, between a user and pages that have been visited.</p>
 
 </section>
 
@@ -160,11 +163,11 @@
 
 <p>User-based recommenders are the "original", conventional style of recommender system.
They can produce good
 recommendations when tweaked properly; they are not necessarily the fastest recommender systems
and
-are thus suitable for small data sets (roughly, less than a million ratings). We'll start
with an example of this.</p>
+are thus suitable for small data sets (roughly, less than ten million ratings). We'll start
with an example of this.</p>
 
 <p>First, create a <code>DataModel</code> of some kind. Here, we'll use
a simple on based
 on data in a file. The file should be in CSV format, with lines of the form <code>userID,itemID,prefValue</code>
-(e.g. "AB39505,290002,3.5"):</p>
+(e.g. "39505,290002,3.5"):</p>
 
 <pre>DataModel model = new FileDataModel(new File("data.txt"));
 </pre>
@@ -193,7 +196,7 @@
 <p>Now we can get 10 recommendations for user ID "1234" &#8212; done!</p>
 
 <pre>List&lt;RecommendedItem&gt; recommendations =
-  cachingRecommender.recommend("1234", 10);
+  cachingRecommender.recommend(1234, 10);
 </pre>
 
 </section>
@@ -232,7 +235,7 @@
 Recommender cachingRecommender = new CachingRecommender(recommender);
 ...
 List&lt;RecommendedItem&gt; recommendations =
-  cachingRecommender.recommend("1234", 10);
+  cachingRecommender.recommend(1234, 10);
 </pre>
 
 </section>
@@ -301,12 +304,13 @@
   <li><code>-server</code>: Enables the server VM, which is generally appropriate
for long-running,
   computation-intensive applications.</li>
   <li><code>-Xms1024m -Xmx1024m</code>: Make the heap as big as possible
-- a gigabyte doesn't hurt when dealing
-  with millions of preferences. Taste will generally use as much memory as you give it for
caching, which helps
+  with tens millions of preferences. Taste will generally use as much memory as you give
it for caching, which helps
   performance. Set the initial and max size to the same value to avoid wasting time growing
the
   heap, and to avoid having the JVM run minor collections to avoid growing the heap, which
will clear
   cached values.</li>
   <li><code>-da -dsa</code>: Disable all assertions.</li>
-  <li><code>-XX:+UseParallelGC</code> (multi-processor machines only):
Use a GC algorithm designed to take
+  <li><code>-XX:+NewRatio=9</code>: Increase heap allocated to 'old' objects,
which is most of them in this framework</li>
+  <li><code>-XX:+UseParallelGC -XX:+UseParallelOldGC</code> (multi-processor
machines only): Use a GC algorithm designed to take
   advantage of multiple processors, and designed for throughput. This is a default in J2SE
5.0.</li>
   <li><code>-XX:-DisableExplicitGC</code>: Disable calls to <code>System.gc()</code>.
These calls can only
   hurt in the presence of modern GC algorithms; they may force Taste to remove cached data
needlessly.
@@ -352,6 +356,20 @@
 double evaluation = evaluator.evaluate(builder, myModel, 0.9, 1.0);
 </pre>
 
+<p>For "boolean" data model situations, where there are no notions of preference value,
the above evaluation
+based on estimated preference does not make sense. In this case, try this kind of evaluation,
which presents
+traditional information retrieval figures like precision and recall, which are more meaningful:</p>
+
+<pre>
+...
+RecommenderIRStatsEvaluator evaluator =
+  new GenericRecommenderIRStatsEvaluator();
+IRStatistics stats =
+  evaluator.evaluate(builder, myModel, null, 3,
+                     RecommenderIRStatusEvaluator.CHOOSE_THRESHOLD,
+                     §1.0);
+</pre>
+
 </section>
 
 </section>



Mime
View raw message