spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-2341) loadLibSVMFile doesn't handle regression datasets
Date Fri, 18 Jul 2014 11:27:04 GMT

    [ https://issues.apache.org/jira/browse/SPARK-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066283#comment-14066283
] 

Sean Owen commented on SPARK-2341:
----------------------------------

[~mengxr] Here is an example of changing the argument:
https://github.com/srowen/spark/commit/4a584ff9c0ada3d035d4668ecf22ec0e65ed16b6

I won't open a PR yet. I think this is a better API at this point, but the question is more
whether the weight of deprecated methods are worth it or not. Another data point to keep in
mind regarding how APIs can evolve.

> loadLibSVMFile doesn't handle regression datasets
> -------------------------------------------------
>
>                 Key: SPARK-2341
>                 URL: https://issues.apache.org/jira/browse/SPARK-2341
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.0.0
>            Reporter: Eustache
>            Priority: Minor
>              Labels: easyfix
>
> Many datasets exist in LibSVM format for regression tasks [1] but currently the loadLibSVMFile
primitive doesn't handle regression datasets.
> More precisely, the LabelParser is either a MulticlassLabelParser or a BinaryLabelParser.
What happens then is that the file is loaded but in multiclass mode : each target value is
interpreted as a class name !
> The fix would be to write a RegressionLabelParser which converts target values to Double
and plug it into the loadLibSVMFile routine.
> [1] http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message