mahout-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rawkintr...@apache.org
Subject [30/62] [abbrv] mahout git commit: WEBSITE Added serial correlation docs
Date Fri, 05 May 2017 01:41:34 GMT
WEBSITE Added serial correlation docs


Project: http://git-wip-us.apache.org/repos/asf/mahout/repo
Commit: http://git-wip-us.apache.org/repos/asf/mahout/commit/fc433408
Tree: http://git-wip-us.apache.org/repos/asf/mahout/tree/fc433408
Diff: http://git-wip-us.apache.org/repos/asf/mahout/diff/fc433408

Branch: refs/heads/master
Commit: fc433408f4f2c9b6065598fa27fe7f9e1a3533a0
Parents: 6e5359d
Author: rawkintrevo <trevor.d.grant@gmail.com>
Authored: Wed May 3 18:09:40 2017 -0500
Committer: rawkintrevo <trevor.d.grant@gmail.com>
Committed: Wed May 3 18:09:40 2017 -0500

----------------------------------------------------------------------
 .../serial-correlation/cochrane-orcutt.md       | 134 ++++++++++++++++++-
 .../regression/serial-correlation/dw-test.md    |  33 ++++-
 2 files changed, 161 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mahout/blob/fc433408/website/docs/algorithms/regression/serial-correlation/cochrane-orcutt.md
----------------------------------------------------------------------
diff --git a/website/docs/algorithms/regression/serial-correlation/cochrane-orcutt.md b/website/docs/algorithms/regression/serial-correlation/cochrane-orcutt.md
index 78bff65..b155ca8 100644
--- a/website/docs/algorithms/regression/serial-correlation/cochrane-orcutt.md
+++ b/website/docs/algorithms/regression/serial-correlation/cochrane-orcutt.md
@@ -4,13 +4,143 @@ title: Cochrane-Orcutt Procedure
 theme:
     name: mahout2
 ---
-TODO: Fill this out!
-Stub
 
 ### About
 
+The [Cochrane Orcutt](https://en.wikipedia.org/wiki/Cochrane%E2%80%93Orcutt_estimation) procedure
is use in economics to 
+adjust a linear model for serial correlation in the error term. 
+ 
+The cooresponding method in R is [`cochrane.orcutt`](https://cran.r-project.org/web/packages/orcutt/orcutt.pdf)
+however the implementation differes slightly. 
+
+#### R Prototype:
+    library(orcutt)
+
+    df = data.frame(t(data.frame(
+        c(20.96,  127.3),
+        c(21.40,  130.0),
+        c(21.96,  132.7),
+        c(21.52,  129.4),
+        c(22.39,  135.0),
+        c(22.76,  137.1),
+        c(23.48,  141.2),
+        c(23.66,  142.8),
+        c(24.10,  145.5),
+        c(24.01,  145.3),
+        c(24.54,  148.3),
+        c(24.30,  146.4),
+        c(25.00,  150.2),
+        c(25.64,  153.1),
+        c(26.36,  157.3),
+        c(26.98,  160.7),
+        c(27.52,  164.2),
+        c(27.78,  165.6),
+        c(28.24,  168.7),
+        c(28.78,  171.7))))
+
+    rownames(df) <- NULL
+    colnames(df) <- c("y", "x")
+    my_lm = lm(y ~ x, data=df)
+    coch = cochrane.orcutt(my_lm)
+
+    
+The R-implementation is kind of...silly.
+
+The above works- converges at 318 iterations- the transformed DW is   1.72, yet the rho is
+ .95882.   After 318 iteartions, this will also report a rho of .95882 (which sugguests SEVERE
+ autocorrelation- nothing close to 1.72.
+
+ At anyrate, the real prototype for this is the example from [Applied Linear Statistcal Models
+ 5th Edition by Kunter, Nachstheim, Neter, and Li](https://www.amazon.com/Applied-Linear-Statistical-Models-Hardcover/dp/B010EWX85C/ref=sr_1_4?ie=UTF8&qid=1493847480&sr=8-4&keywords=applied+linear+statistical+models+5th+edition).
 
+ 
+Steps:
+1. Normal Regression
+2. Estimate <foo>\(\rho\)</foo>
+3. Get Estimates of Transformed Equation
+4. Step 5: Use Betas from (4) to recalculate model from (1)
+5. Step 6: repeat  Step 2 through 5 until a stopping criteria is met. Some models call for
convergence-
+Kunter et. al reccomend 3 iterations, if you don't achieve desired results, use an alternative
method.
+ 
+#### Some additional notes from Applied Linear Statistical Models:
+ They also provide some interesting notes on p 494:
+ 
+ 1. "Cochrane-Orcutt does not always work properly.  A major reason is that when the error
terms
+ are positively autocorrelated, the estimate <foo>\(r\)</foo> in (12.22) tends
to underestimate the autocorrelation
+ parameter <foo>\(\rho\)</foo>.  When this bias is serious, it can significantly
reduce the effectiveness of the
+ Cochrane-Orcutt approach.
+ 1. "There exists an approximate relation between the [Durbin Watson test statistic](dw-test.html)
<foo>\(\mathbf{D}\)</foo> in (12.14)
+ and the estimated autocorrelation paramater <foo>\(r\)</foo> in (12.22):
+ <center>\(D ~= 2(1-\rho)\)</center>
+
+ They also note on p492:
+ "... If the process does not terminate after one or two iterations, a different procedure
+ should be employed."
+ This differs from the logic found elsewhere, and the method presented in R where, in the
simple
+  example in the prototype, the procedure runs for 318 iterations. This is why the default
+ maximum iteratoins are 3, and should be left as such.
+
+ Also, the prototype and 'correct answers' are based on the example presented in Kunter et.
al on
+ p492-4 (including dataset).
+
+
 ### Parameters
 
+
+<div class="table-striped">
+  <table class="table">
+    <tr>
+        <th>Parameter</th>
+        <th>Description</th>
+        <th>Default Value</th>
+     </tr>
+     <tr>
+        <td><code>'regressor</code></td>
+        <td>Any subclass of <code>org.apache.mahout.math.algorithms.regression.LinearRegressorFitter</code></td>
+        <td><code>OrdinaryLeastSquares()</code></td>
+     </tr>
+     <tr>
+        <td><code>'iteratoins</code></td>
+        <td>Unlike our friends in R- we stick to the 3 iteration guidance.</td>
+        <td>3</td>
+     </tr>
+     <tr>
+        <td><code>'cacheHint</code></td>
+        <td>The DRM Cache Hint to use when holding the data in memory between iterations</td>
+        <td><code>CacheHint.MEMORY_ONLY</code></td>
+     </tr>                 
+  </table>
+</div>
+
 ### Example
 
 
+    val alsmBlaisdellCo = drmParallelize( dense(
+          (20.96,  127.3),
+          (21.40,  130.0),
+          (21.96,  132.7),
+          (21.52,  129.4),
+          (22.39,  135.0),
+          (22.76,  137.1),
+          (23.48,  141.2),
+          (23.66,  142.8),
+          (24.10,  145.5),
+          (24.01,  145.3),
+          (24.54,  148.3),
+          (24.30,  146.4),
+          (25.00,  150.2),
+          (25.64,  153.1),
+          (26.36,  157.3),
+          (26.98,  160.7),
+          (27.52,  164.2),
+          (27.78,  165.6),
+          (28.24,  168.7),
+          (28.78,  171.7) ))
+    
+    val drmY = alsmBlaisdellCo(::, 0 until 1)
+    val drmX = alsmBlaisdellCo(::, 1 until 2)
+
+    var coModel = new CochraneOrcutt[Int]().fit(drmX, drmY , ('iterations -> 2))
+    
+    println(coModel.rhos)
+    println(coModel.summary)
+

http://git-wip-us.apache.org/repos/asf/mahout/blob/fc433408/website/docs/algorithms/regression/serial-correlation/dw-test.md
----------------------------------------------------------------------
diff --git a/website/docs/algorithms/regression/serial-correlation/dw-test.md b/website/docs/algorithms/regression/serial-correlation/dw-test.md
index 64ca831..7bdd896 100644
--- a/website/docs/algorithms/regression/serial-correlation/dw-test.md
+++ b/website/docs/algorithms/regression/serial-correlation/dw-test.md
@@ -5,14 +5,39 @@ theme:
     name: mahout2
 ---
 
-stub
-TODO: Fill this out!
-Stub
-
 ### About
 
+The [Durbin Watson Test](https://en.wikipedia.org/wiki/Durbin%E2%80%93Watson_statistic) is
a test for serial correlation
+in error terms.  The Durbin Watson test statistic <foo>\(d\)</foo> can take values
between 0 and 4, and in general
+
+- <foo>\(d \lt 1.5 \)</foo> implies positive autocorrelation
+- <foo>\(d \gt 2.5 \)</foo>  implies negative autocorrelation
+- <foo>\(1.5 \lt d \lt 2.5 \)</foo>  implies to autocorrelation.
+
+Implementation is based off of the `durbinWatsonTest` function in the [`car`](https://cran.r-project.org/web/packages/car/index.html)
package in R
+
 ### Parameters
 
 ### Example
 
+#### R Prototype
+    
+    library(car)
+    residuals <- seq(0, 4.9, 0.1)
+    ## perform Durbin-Watson test
+    durbinWatsonTest(residuals)
+    
+#### In Apache Mahout
+
+    
+    // A DurbinWatson Test must be performed on a model. The model does not matter.
+    val drmX = drmParallelize( dense((0 until 50).toArray.map( t => Math.pow(-1.0, t))
) ).t
+    val drmY = drmX + err1 + 1
+    var model = new OrdinaryLeastSquares[Int]().fit(drmX, drmY)
+    // end arbitrary model 
+    
+    val err1 =  drmParallelize( dense((0.0 until 5.0 by 0.1).toArray) ).t
+    val syntheticResiduals = err1
+    model = AutocorrelationTests.DurbinWatson(model, syntheticResiduals)
+    val myAnswer: Double = model.testResults.getOrElse('durbinWatsonTestStatistic, -1.0).asInstanceOf[Double]
 


Mime
View raw message