spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Pentreath (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-15944) Make spark.ml package backward compatible with spark.mllib vectors
Date Thu, 30 Jun 2016 11:47:10 GMT

    [ https://issues.apache.org/jira/browse/SPARK-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15356944#comment-15356944
] 

Nick Pentreath edited comment on SPARK-15944 at 6/30/16 11:46 AM:
------------------------------------------------------------------

As I commented on [PR #13924|https://github.com/apache/spark/pull/13924]:

bq. It happens to work for dense vectors because it effectively calls {{np.array(DenseVector)}},
but not for sparse. Workaround is fairly ugly: {{mlSV = NewVectors.sparse(mllibSV.size, zip(mllibSV.indices,
mllibSV.values))}}, or something similar.

I think we need convenience methods for Python too - I've created SPARK-16328 to track that.




was (Author: mlnick):
As I commented on [PR #13924|https://github.com/apache/spark/pull/13924]:

>  It happens to work for dense vectors because it effectively calls {{np.array(DenseVector)}},
but not for sparse. Workaround is fairly ugly: {{mlSV = NewVectors.sparse(mllibSV.size, zip(mllibSV.indices,
mllibSV.values))}}, or something similar.

I think we need convenience methods for Python too - I've created SPARK-16328 to track that.



> Make spark.ml package backward compatible with spark.mllib vectors
> ------------------------------------------------------------------
>
>                 Key: SPARK-15944
>                 URL: https://issues.apache.org/jira/browse/SPARK-15944
>             Project: Spark
>          Issue Type: Umbrella
>          Components: ML, MLlib
>    Affects Versions: 2.0.0
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>            Priority: Critical
>
> During QA, we found that it is not trivial to convert a DataFrame with old vector columns
to new vector columns. So it would be easier for users to migrate their datasets and pipelines
if we:
> 1) provide utils to convert DataFrames with vector columns
> 2) automatically detect and convert old vector columns in ML pipelines
> This is an umbrella JIRA to track the progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message