mahout-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject svn commit: r945539 - in /websites/staging/mahout/trunk/content: ./ users/classification/logistic-regression.html
Date Sun, 29 Mar 2015 18:54:05 GMT
Author: buildbot
Date: Sun Mar 29 18:54:05 2015
New Revision: 945539

Staging update by buildbot for mahout

    websites/staging/mahout/trunk/content/   (props changed)

Propchange: websites/staging/mahout/trunk/content/
--- cms:source-revision (original)
+++ cms:source-revision Sun Mar 29 18:54:05 2015
@@ -1 +1 @@

Modified: websites/staging/mahout/trunk/content/users/classification/logistic-regression.html
--- websites/staging/mahout/trunk/content/users/classification/logistic-regression.html (original)
+++ websites/staging/mahout/trunk/content/users/classification/logistic-regression.html Sun
Mar 29 18:54:05 2015
@@ -261,8 +261,10 @@ production fraud detection and advertisi
 The Mahout implementation uses Stochastic Gradient Descent (SGD) to all
 large training sets to be used.</p>
 <p>For a more detailed analysis of the approach, have a look at the <a href=";language=en">thesis
-Paul Komarek</a>.</p>
+Paul Komarek</a> [1].</p>
 <p>See MAHOUT-228 for the main JIRA issue for SGD.</p>
+<p>A more detailed overview of the Mahout Linear Regression classifier and <a href="">detailed
discription of building a Logistic Regression classifier</a> for the classic <a href="">Iris
flower dataset</a> is also available [2]. </p>
+<p>An example of using training a Logistic Regression classifier for the <a href="">UCI
Bank Marketing Dataset</a> can be found <a href="">on
the Mahout website</a> [3].</p>
 <p><a name="LogisticRegression-Parallelizationstrategy"></a></p>
 <h2 id="parallelization-strategy">Parallelization strategy</h2>
 <p>The bad news is that SGD is an inherently sequential algorithm.  The good
@@ -298,7 +300,7 @@ include</p>
 <p><a name="LogisticRegression-Featurevectorencoding"></a></p>
-<h3 id="feature-vector-encoding">Feature vector encoding</h3>
+<h2 id="feature-vector-encoding">Feature vector encoding</h2>
 <p>Because the SGD algorithms need to have fixed length feature vectors and
 because it is a pain to build a dictionary ahead of time, most SGD
 applications use the hashed feature vector encoding system that is rooted
@@ -317,7 +319,7 @@ case you are getting your training data
 <p>Here is a class diagram for the encoders package:</p>
 <p><img alt="class diagram" src="../../images/vector-class-hierarchy.png" /></p>
 <p><a name="LogisticRegression-SGDLearning"></a></p>
-<h3 id="sgd-learning">SGD Learning</h3>
+<h2 id="sgd-learning">SGD Learning</h2>
 <p>For the simplest applications, you can construct an
 OnlineLogisticRegression and be off and running.  Typically, though, it is
 nice to have running estimates of performance on held out data.  To do
@@ -338,6 +340,12 @@ so that you don't have to.</p>
 the number of twiddlable knobs is pretty large.  For some examples, see the
 TrainNewsGroups example code.</p>
 <p><img alt="sgd class diagram" src="../../images/sgd-class-hierarchy.png" /></p>
+<h2 id="references">References</h2>
+<p>[1] <a href=";language=en">Thesis
+Paul Komarek</a></p>
+<p>[2] <a href="">An
Introduction To Mahout's Logistic Regression SGD Classifier</a></p>
+<h2 id="examples">Examples</h2>
+<p>[3] <a href="">SGD
Bank Marketing Example</a></p>

View raw message