Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4ADFB200C8F for ; Fri, 5 May 2017 03:41:15 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 49935160BC4; Fri, 5 May 2017 01:41:15 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6AD86160BC9 for ; Fri, 5 May 2017 03:41:14 +0200 (CEST) Received: (qmail 60856 invoked by uid 500); 5 May 2017 01:41:08 -0000 Mailing-List: contact commits-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list commits@mahout.apache.org Received: (qmail 58543 invoked by uid 99); 5 May 2017 01:41:07 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 May 2017 01:41:07 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 97B85F17B9; Fri, 5 May 2017 01:41:06 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: rawkintrevo@apache.org To: commits@mahout.apache.org Date: Fri, 05 May 2017 01:41:34 -0000 Message-Id: <0289989bca5345f88a88e08bbd426515@git.apache.org> In-Reply-To: References: X-Mailer: ASF-Git Admin Mailer Subject: [30/62] [abbrv] mahout git commit: WEBSITE Added serial correlation docs archived-at: Fri, 05 May 2017 01:41:15 -0000 WEBSITE Added serial correlation docs Project: http://git-wip-us.apache.org/repos/asf/mahout/repo Commit: http://git-wip-us.apache.org/repos/asf/mahout/commit/fc433408 Tree: http://git-wip-us.apache.org/repos/asf/mahout/tree/fc433408 Diff: http://git-wip-us.apache.org/repos/asf/mahout/diff/fc433408 Branch: refs/heads/master Commit: fc433408f4f2c9b6065598fa27fe7f9e1a3533a0 Parents: 6e5359d Author: rawkintrevo Authored: Wed May 3 18:09:40 2017 -0500 Committer: rawkintrevo Committed: Wed May 3 18:09:40 2017 -0500 ---------------------------------------------------------------------- .../serial-correlation/cochrane-orcutt.md | 134 ++++++++++++++++++- .../regression/serial-correlation/dw-test.md | 33 ++++- 2 files changed, 161 insertions(+), 6 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/mahout/blob/fc433408/website/docs/algorithms/regression/serial-correlation/cochrane-orcutt.md ---------------------------------------------------------------------- diff --git a/website/docs/algorithms/regression/serial-correlation/cochrane-orcutt.md b/website/docs/algorithms/regression/serial-correlation/cochrane-orcutt.md index 78bff65..b155ca8 100644 --- a/website/docs/algorithms/regression/serial-correlation/cochrane-orcutt.md +++ b/website/docs/algorithms/regression/serial-correlation/cochrane-orcutt.md @@ -4,13 +4,143 @@ title: Cochrane-Orcutt Procedure theme: name: mahout2 --- -TODO: Fill this out! -Stub ### About +The [Cochrane Orcutt](https://en.wikipedia.org/wiki/Cochrane%E2%80%93Orcutt_estimation) procedure is use in economics to +adjust a linear model for serial correlation in the error term. + +The cooresponding method in R is [`cochrane.orcutt`](https://cran.r-project.org/web/packages/orcutt/orcutt.pdf) +however the implementation differes slightly. + +#### R Prototype: + library(orcutt) + + df = data.frame(t(data.frame( + c(20.96, 127.3), + c(21.40, 130.0), + c(21.96, 132.7), + c(21.52, 129.4), + c(22.39, 135.0), + c(22.76, 137.1), + c(23.48, 141.2), + c(23.66, 142.8), + c(24.10, 145.5), + c(24.01, 145.3), + c(24.54, 148.3), + c(24.30, 146.4), + c(25.00, 150.2), + c(25.64, 153.1), + c(26.36, 157.3), + c(26.98, 160.7), + c(27.52, 164.2), + c(27.78, 165.6), + c(28.24, 168.7), + c(28.78, 171.7)))) + + rownames(df) <- NULL + colnames(df) <- c("y", "x") + my_lm = lm(y ~ x, data=df) + coch = cochrane.orcutt(my_lm) + + +The R-implementation is kind of...silly. + +The above works- converges at 318 iterations- the transformed DW is 1.72, yet the rho is + .95882. After 318 iteartions, this will also report a rho of .95882 (which sugguests SEVERE + autocorrelation- nothing close to 1.72. + + At anyrate, the real prototype for this is the example from [Applied Linear Statistcal Models + 5th Edition by Kunter, Nachstheim, Neter, and Li](https://www.amazon.com/Applied-Linear-Statistical-Models-Hardcover/dp/B010EWX85C/ref=sr_1_4?ie=UTF8&qid=1493847480&sr=8-4&keywords=applied+linear+statistical+models+5th+edition). + +Steps: +1. Normal Regression +2. Estimate \(\rho\) +3. Get Estimates of Transformed Equation +4. Step 5: Use Betas from (4) to recalculate model from (1) +5. Step 6: repeat Step 2 through 5 until a stopping criteria is met. Some models call for convergence- +Kunter et. al reccomend 3 iterations, if you don't achieve desired results, use an alternative method. + +#### Some additional notes from Applied Linear Statistical Models: + They also provide some interesting notes on p 494: + + 1. "Cochrane-Orcutt does not always work properly. A major reason is that when the error terms + are positively autocorrelated, the estimate \(r\) in (12.22) tends to underestimate the autocorrelation + parameter \(\rho\). When this bias is serious, it can significantly reduce the effectiveness of the + Cochrane-Orcutt approach. + 1. "There exists an approximate relation between the [Durbin Watson test statistic](dw-test.html) \(\mathbf{D}\) in (12.14) + and the estimated autocorrelation paramater \(r\) in (12.22): +
\(D ~= 2(1-\rho)\)
+ + They also note on p492: + "... If the process does not terminate after one or two iterations, a different procedure + should be employed." + This differs from the logic found elsewhere, and the method presented in R where, in the simple + example in the prototype, the procedure runs for 318 iterations. This is why the default + maximum iteratoins are 3, and should be left as such. + + Also, the prototype and 'correct answers' are based on the example presented in Kunter et. al on + p492-4 (including dataset). + + ### Parameters + +
+ + + + + + + + + + + + + + + + + + + + + +
ParameterDescriptionDefault Value
'regressorAny subclass of org.apache.mahout.math.algorithms.regression.LinearRegressorFitterOrdinaryLeastSquares()
'iteratoinsUnlike our friends in R- we stick to the 3 iteration guidance.3
'cacheHintThe DRM Cache Hint to use when holding the data in memory between iterationsCacheHint.MEMORY_ONLY
+
+ ### Example + val alsmBlaisdellCo = drmParallelize( dense( + (20.96, 127.3), + (21.40, 130.0), + (21.96, 132.7), + (21.52, 129.4), + (22.39, 135.0), + (22.76, 137.1), + (23.48, 141.2), + (23.66, 142.8), + (24.10, 145.5), + (24.01, 145.3), + (24.54, 148.3), + (24.30, 146.4), + (25.00, 150.2), + (25.64, 153.1), + (26.36, 157.3), + (26.98, 160.7), + (27.52, 164.2), + (27.78, 165.6), + (28.24, 168.7), + (28.78, 171.7) )) + + val drmY = alsmBlaisdellCo(::, 0 until 1) + val drmX = alsmBlaisdellCo(::, 1 until 2) + + var coModel = new CochraneOrcutt[Int]().fit(drmX, drmY , ('iterations -> 2)) + + println(coModel.rhos) + println(coModel.summary) + http://git-wip-us.apache.org/repos/asf/mahout/blob/fc433408/website/docs/algorithms/regression/serial-correlation/dw-test.md ---------------------------------------------------------------------- diff --git a/website/docs/algorithms/regression/serial-correlation/dw-test.md b/website/docs/algorithms/regression/serial-correlation/dw-test.md index 64ca831..7bdd896 100644 --- a/website/docs/algorithms/regression/serial-correlation/dw-test.md +++ b/website/docs/algorithms/regression/serial-correlation/dw-test.md @@ -5,14 +5,39 @@ theme: name: mahout2 --- -stub -TODO: Fill this out! -Stub - ### About +The [Durbin Watson Test](https://en.wikipedia.org/wiki/Durbin%E2%80%93Watson_statistic) is a test for serial correlation +in error terms. The Durbin Watson test statistic \(d\) can take values between 0 and 4, and in general + +- \(d \lt 1.5 \) implies positive autocorrelation +- \(d \gt 2.5 \) implies negative autocorrelation +- \(1.5 \lt d \lt 2.5 \) implies to autocorrelation. + +Implementation is based off of the `durbinWatsonTest` function in the [`car`](https://cran.r-project.org/web/packages/car/index.html) package in R + ### Parameters ### Example +#### R Prototype + + library(car) + residuals <- seq(0, 4.9, 0.1) + ## perform Durbin-Watson test + durbinWatsonTest(residuals) + +#### In Apache Mahout + + + // A DurbinWatson Test must be performed on a model. The model does not matter. + val drmX = drmParallelize( dense((0 until 50).toArray.map( t => Math.pow(-1.0, t)) ) ).t + val drmY = drmX + err1 + 1 + var model = new OrdinaryLeastSquares[Int]().fit(drmX, drmY) + // end arbitrary model + + val err1 = drmParallelize( dense((0.0 until 5.0 by 0.1).toArray) ).t + val syntheticResiduals = err1 + model = AutocorrelationTests.DurbinWatson(model, syntheticResiduals) + val myAnswer: Double = model.testResults.getOrElse('durbinWatsonTestStatistic, -1.0).asInstanceOf[Double]