spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yanbo Liang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-9793) PySpark DenseVector, SparseVector should override __eq__
Date Sat, 15 Aug 2015 09:01:45 GMT

    [ https://issues.apache.org/jira/browse/SPARK-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698190#comment-14698190
] 

Yanbo Liang commented on SPARK-9793:
------------------------------------

[~josephkb] I have combined this with SPARK-9940 and merged the two PRs.
The new PR makes PySpark Vector semantic equality and hash uses first 16 entries like what
Scala does.
It can fix the issues that [~mengxr]'s list at SPARK-9750
* Python
** DenseVector: Semantic eq but only with `DenseVector`. Default hash. -> bug
** SparseVector: Semantic eq but wrong (only with `SparseVector` and not handling explicit
zeros). Default hash. -> bug 

> PySpark DenseVector, SparseVector should override __eq__
> --------------------------------------------------------
>
>                 Key: SPARK-9793
>                 URL: https://issues.apache.org/jira/browse/SPARK-9793
>             Project: Spark
>          Issue Type: Bug
>          Components: ML, PySpark
>    Affects Versions: 1.5.0
>            Reporter: Joseph K. Bradley
>            Priority: Critical
>
> See [SPARK-9750].
> PySpark DenseVector and SparseVector do not override the equality operator properly.
 They should use semantics, not representation, for comparison.  (This is what Scala currently
does.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message