spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yanbo Liang (JIRA)" <>
Subject [jira] [Commented] (SPARK-9793) PySpark DenseVector, SparseVector should override __eq__
Date Sat, 15 Aug 2015 09:01:45 GMT


Yanbo Liang commented on SPARK-9793:

[~josephkb] I have combined this with SPARK-9940 and merged the two PRs.
The new PR makes PySpark Vector semantic equality and hash uses first 16 entries like what
Scala does.
It can fix the issues that [~mengxr]'s list at SPARK-9750
* Python
** DenseVector: Semantic eq but only with `DenseVector`. Default hash. -> bug
** SparseVector: Semantic eq but wrong (only with `SparseVector` and not handling explicit
zeros). Default hash. -> bug 

> PySpark DenseVector, SparseVector should override __eq__
> --------------------------------------------------------
>                 Key: SPARK-9793
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: ML, PySpark
>    Affects Versions: 1.5.0
>            Reporter: Joseph K. Bradley
>            Priority: Critical
> See [SPARK-9750].
> PySpark DenseVector and SparseVector do not override the equality operator properly.
 They should use semantics, not representation, for comparison.  (This is what Scala currently

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message