spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-11920) ML LinearRegression should use correct dataset in examples and user guide doc
Date Mon, 23 Nov 2015 09:12:11 GMT

     [ https://issues.apache.org/jira/browse/SPARK-11920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-11920:
------------------------------------

    Assignee:     (was: Apache Spark)

> ML LinearRegression should use correct dataset in examples and user guide doc
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-11920
>                 URL: https://issues.apache.org/jira/browse/SPARK-11920
>             Project: Spark
>          Issue Type: Improvement
>          Components: Documentation, ML
>            Reporter: Yanbo Liang
>            Priority: Minor
>
> ML LinearRegression use data/mllib/sample_libsvm_data.txt as dataset in examples and
user guide doc, but it's actually classification dataset rather than regression dataset. We
should use data/mllib/sample_linear_regression_data.txt instead.
> The deeper causes is that LinearRegression with "normal" solver can not solve this dataset
correctly, may be due to the ill condition and unreasonable label. This issue has been reported
at SPARK-11918.
> So we should make this change in examples and user guides, that can clearly illustrate
the usage of LinearRegression algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message