[ https://issues.apache.org/jira/browse/SPARK9836?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=14982905#comment14982905
]
Yanbo Liang edited comment on SPARK9836 at 10/30/15 5:18 PM:

[~mengxr] After survey I found that "Deviance Residuals" and "Coefficients: Estimate Std.
Error t value Pr(>t) " are statistics for OLS/WLS, I will add these statistics in this
task.
As to the remaining part
{quote}
Null deviance: 102.168 on 149 degrees of freedom
Residual deviance: 28.004 on 146 degrees of freedom
AIC: 183.94
Number of Fisher Scoring iterations: 2
{quote}
Some of the statistics variables depends upon IRLS(SPARK9835). I have found you have open
SPARK9837 to track summary statistics for GLMs via IRLS, so these statistics will be work
of SPARK9837. Please correct me if have misunderstand. :)
was (Author: yanboliang):
[~mengxr] After survey I found that "Deviance Residuals" and "Coefficients: Estimate Std.
Error t value Pr(>t) " are statistics for OLS/WLS, I will add these statistics in this
task.
As to the following part
{quote}
Null deviance: 102.168 on 149 degrees of freedom
Residual deviance: 28.004 on 146 degrees of freedom
AIC: 183.94
Number of Fisher Scoring iterations: 2
{quote}
Some of the statistics variables depends upon IRLS(SPARK9835). I have found you have open
SPARK9837 to track summary statistics for GLMs via IRLS, so these statistics will be work
of SPARK9837. Please correct me if have misunderstand. :)
> Provide Rlike summary statistics for ordinary least squares via normal equation solver
> 
>
> Key: SPARK9836
> URL: https://issues.apache.org/jira/browse/SPARK9836
> Project: Spark
> Issue Type: Subtask
> Components: ML
> Reporter: Xiangrui Meng
> Assignee: Yanbo Liang
>
> In R, model fitting comes with summary statistics. We can provide most of those via normal
equation solver (SPARK9834). If some statistics requires additional passes to the dataset,
we can expose an option to let users select desired statistics before model fitting.
> {code}
> > summary(model)
> Call:
> glm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris)
> Deviance Residuals:
> Min 1Q Median 3Q Max
> 1.30711 0.25713 0.05325 0.19542 1.41253
> Coefficients:
> Estimate Std. Error t value Pr(>t)
> (Intercept) 2.2514 0.3698 6.089 9.57e09 ***
> Sepal.Width 0.8036 0.1063 7.557 4.19e12 ***
> Speciesversicolor 1.4587 0.1121 13.012 < 2e16 ***
> Speciesvirginica 1.9468 0.1000 19.465 < 2e16 ***
> 
> Signif. codes:
> 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> (Dispersion parameter for gaussian family taken to be 0.1918059)
> Null deviance: 102.168 on 149 degrees of freedom
> Residual deviance: 28.004 on 146 degrees of freedom
> AIC: 183.94
> Number of Fisher Scoring iterations: 2
> {code}

This message was sent by Atlassian JIRA
(v6.3.4#6332)

To unsubscribe, email: issuesunsubscribe@spark.apache.org
For additional commands, email: issueshelp@spark.apache.org
