spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teng Peng (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-22449) Add BIC for GLM
Date Sun, 05 Nov 2017 05:58:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-22449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Teng Peng updated SPARK-22449:
------------------------------
    Description: 
Currently, we only have AIC for GLM. BIC is another "similar" criterion widely used and implemented
in all major statical tools.

Postive reasons: 
1. Completeness.
2. Useful for some users.

Negative reasons:
1. Not sure how many users would actually use BIC.

Possible Implementation:
1. Duplicate AIC's methods. Calculate penalty term independently. Pros: safe & consistent.
Cons: duplication.
2. Let AIC & BIC share the log likelihood by a same method. Calculate penalty term independently.
Pros: similar to scikit learn. No duplication. Cons: less safe & consistent.

Reference:
1. https://stats.stackexchange.com/questions/577/is-there-any-reason-to-prefer-the-aic-or-bic-over-the-other
2.http://users.stat.umn.edu/~yangx374/papers/Pre-Print_2003-10_Biometrika.pdf

Thoughts?

  was:
Currently, we only have AIC for GLM. BIC is another "similar" criterion widely used and implemented
in all major statical tools.

Postive reasons: 
1. Completeness.
2. Useful for some users.

Negative reasons:
1. Not sure how many users would actually use BIC.

Possible Implementation:
1. Duplicate almost the same methods for log likelihood part. Calculate penalty term independently.
Pros: safe & consistent. Cons: duplication.
2. Let AIC & BIC share the log likelihood by a same method. Calculate penalty term independently.
Pros: similar to scikit learn. No duplication. Cons: less safe & consistent.

Reference:
1. https://stats.stackexchange.com/questions/577/is-there-any-reason-to-prefer-the-aic-or-bic-over-the-other
2.http://users.stat.umn.edu/~yangx374/papers/Pre-Print_2003-10_Biometrika.pdf

Thoughts?


> Add BIC for GLM
> ---------------
>
>                 Key: SPARK-22449
>                 URL: https://issues.apache.org/jira/browse/SPARK-22449
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.2.0
>            Reporter: Teng Peng
>            Priority: Minor
>
> Currently, we only have AIC for GLM. BIC is another "similar" criterion widely used and
implemented in all major statical tools.
> Postive reasons: 
> 1. Completeness.
> 2. Useful for some users.
> Negative reasons:
> 1. Not sure how many users would actually use BIC.
> Possible Implementation:
> 1. Duplicate AIC's methods. Calculate penalty term independently. Pros: safe & consistent.
Cons: duplication.
> 2. Let AIC & BIC share the log likelihood by a same method. Calculate penalty term
independently.
> Pros: similar to scikit learn. No duplication. Cons: less safe & consistent.
> Reference:
> 1. https://stats.stackexchange.com/questions/577/is-there-any-reason-to-prefer-the-aic-or-bic-over-the-other
> 2.http://users.stat.umn.edu/~yangx374/papers/Pre-Print_2003-10_Biometrika.pdf
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message