mahout-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tdunn...@apache.org
Subject svn commit: r995189 - /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sgd/AdaptiveLogisticRegression.java
Date Wed, 08 Sep 2010 18:49:59 GMT
Author: tdunning
Date: Wed Sep  8 18:49:59 2010
New Revision: 995189

URL: http://svn.apache.org/viewvc?rev=995189&view=rev
Log:
Use logLikelihood for fitness in non-binary case

Modified:
    mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sgd/AdaptiveLogisticRegression.java

Modified: mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sgd/AdaptiveLogisticRegression.java
URL: http://svn.apache.org/viewvc/mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sgd/AdaptiveLogisticRegression.java?rev=995189&r1=995188&r2=995189&view=diff
==============================================================================
--- mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sgd/AdaptiveLogisticRegression.java
(original)
+++ mahout/trunk/core/src/main/java/org/apache/mahout/classifier/sgd/AdaptiveLogisticRegression.java
Wed Sep  8 18:49:59 2010
@@ -44,11 +44,18 @@ import java.util.concurrent.ExecutionExc
  * of performance on the fly even if we make many passes through the data.  This does, however,
  * increase the cost of training since if we are using 5-fold cross-validation, each vector
is used
  * 4 times for training and once for classification.  If this becomes a problem, then we
should
- * probably use a 2-way unbalanced train/test split rather than full cross validation.
- *
+ * probably use a 2-way unbalanced train/test split rather than full cross validation.  With
the
+ * current default settings, we have 100 learners running.  This is better than the alternative
+ * of running hundreds of training passes to find good hyper-parameters because we only have
to
+ * parse and feature-ize our inputs once.  If you already have good hyper-parameters, then
you
+ * might prefer to just run one CrossFoldLearner with those settings.
+ * <p/>
  * The fitness used here is AUC.  Another alternative would be to try log-likelihood, but
it is
  * much easier to get bogus values of log-likelihood than with AUC and the results seem to
- * accord pretty well.  It would be nice to allow the fitness function to be pluggable.
+ * accord pretty well.  It would be nice to allow the fitness function to be pluggable. This
+ * use of AUC means that AdaptiveLogisticRegression is mostly suited for binary target variables.
+ * This will be fixed before long by extending OnlineAuc to handle non-binary cases or by
using
+ * a different fitness value in non-binary cases.
  */
 public class AdaptiveLogisticRegression implements OnlineLearner {
   private int record = 0;
@@ -100,7 +107,11 @@ public class AdaptiveLogisticRegression 
             x.train(example);
           }
           if (x.getLearner().validModel()) {
-            return x.wrapped.auc();
+            if (x.getLearner().numCategories() == 2) {
+              return x.wrapped.auc();
+            } else {
+              return x.wrapped.logLikelihood();
+            }
           } else {
             return Double.NaN;
           }



Mime
View raw message