spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiangrui Meng (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-9836) Provide R-like summary statistics for ordinary least squares via normal equation solver
Date Tue, 03 Nov 2015 16:30:27 GMT

     [ https://issues.apache.org/jira/browse/SPARK-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xiangrui Meng resolved SPARK-9836.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 1.6.0

Issue resolved by pull request 9413
[https://github.com/apache/spark/pull/9413]

> Provide R-like summary statistics for ordinary least squares via normal equation solver
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-9836
>                 URL: https://issues.apache.org/jira/browse/SPARK-9836
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Xiangrui Meng
>            Assignee: Yanbo Liang
>            Priority: Critical
>             Fix For: 1.6.0
>
>
> In R, model fitting comes with summary statistics. We can provide most of those via normal
equation solver (SPARK-9834). If some statistics requires additional passes to the dataset,
we can expose an option to let users select desired statistics before model fitting. 
> {code}
> > summary(model)
> Call:
> glm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris)
> Deviance Residuals: 
>      Min        1Q    Median        3Q       Max  
> -1.30711  -0.25713  -0.05325   0.19542   1.41253  
> Coefficients:
>                   Estimate Std. Error t value Pr(>|t|)    
> (Intercept)         2.2514     0.3698   6.089 9.57e-09 ***
> Sepal.Width         0.8036     0.1063   7.557 4.19e-12 ***
> Speciesversicolor   1.4587     0.1121  13.012  < 2e-16 ***
> Speciesvirginica    1.9468     0.1000  19.465  < 2e-16 ***
> ---
> Signif. codes:  
> 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> (Dispersion parameter for gaussian family taken to be 0.1918059)
>     Null deviance: 102.168  on 149  degrees of freedom
> Residual deviance:  28.004  on 146  degrees of freedom
> AIC: 183.94
> Number of Fisher Scoring iterations: 2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message