spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jrdi <...@git.apache.org>
Subject [GitHub] spark pull request #7080: [SPARK-8700][ML] Disable feature scaling in Logist...
Date Tue, 06 Mar 2018 10:40:34 GMT
Github user jrdi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7080#discussion_r171688655
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
    @@ -534,27 +559,43 @@ private class LogisticCostFun(
               case (aggregator1, aggregator2) => aggregator1.merge(aggregator2)
             })
     
    -    // regVal is the sum of weight squares for L2 regularization
    -    val norm = if (regParamL2 == 0.0) {
    -      0.0
    -    } else if (fitIntercept) {
    -      brzNorm(Vectors.dense(weights.toArray.slice(0, weights.size -1)).toBreeze, 2.0)
    -    } else {
    -      brzNorm(weights, 2.0)
    -    }
    -    val regVal = 0.5 * regParamL2 * norm * norm
    +    val totalGradientArray = logisticAggregator.gradient.toArray
     
    -    val loss = logisticAggregator.loss + regVal
    -    val gradient = logisticAggregator.gradient
    -
    -    if (fitIntercept) {
    -      val wArray = w.toArray.clone()
    -      wArray(wArray.length - 1) = 0.0
    -      axpy(regParamL2, Vectors.dense(wArray), gradient)
    +    // regVal is the sum of weight squares excluding intercept for L2 regularization.
    +    val regVal = if (regParamL2 == 0.0) {
    +      0.0
         } else {
    -      axpy(regParamL2, w, gradient)
    +      var sum = 0.0
    +      w.foreachActive { (index, value) =>
    +        // If `fitIntercept` is true, the last term which is intercept doesn't
    +        // contribute to the regularization.
    +        if (index != numFeatures) {
    +          // The following code will compute the loss of the regularization; also
    +          // the gradient of the regularization, and add back to totalGradientArray.
    +          sum += {
    +            if (standardization) {
    +              totalGradientArray(index) += regParamL2 * value
    +              value * value
    +            } else {
    +              if (featuresStd(index) != 0.0) {
    +                // If `standardization` is false, we still standardize the data
    +                // to improve the rate of convergence; as a result, we have to
    +                // perform this reverse standardization by penalizing each component
    +                // differently to get effectively the same objective function when
    +                // the training dataset is not standardized.
    +                val temp = value / (featuresStd(index) * featuresStd(index))
    +                totalGradientArray(index) += regParamL2 * temp
    +                value * temp
    --- End diff --
    
    @dbtsai I understand this change was implemented years ago but I don't get why are you
creating `temp` as `value / std^2`. Is not supposed to be standardized? Shouldn't be `value
/ std`? In addition, `value * temp` is equivalent to `(value / std)^2` which makes more sense
to me. 
    
    Comparing it to the standardized method where we are adding `regParamL2 * standardizeValue`
to the gradient and 'returning' `standardizeValue^2`, here we are adding `regParamL2 * standardizeValue
/ std` and returning `standardizeValue^2`.
    
    Does it makes sense? Am I missing some point here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message