spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mengxr <...@git.apache.org>
Subject [GitHub] spark pull request: SPARK-11420 Updating Stddev support via Impera...
Date Wed, 11 Nov 2015 01:57:06 GMT
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/9380#issuecomment-155630541
  
    @JihongMA Could you merge the current master? There are some merge conflicts.
    
    For `NaN` vs. `null`, we had some discussion in https://issues.apache.org/jira/browse/SPARK-9079.
The design is to return `NaN` is there exist `NaN` values in the aggregation. I think we should
return `NaN` here, which is consistent with R and Python:
    
    ~~~R
    > mean(c())
    [1] NA
    > var(c(1))
    [1] NA
    ~~~
    
    ~~~python
    > np.mean([])
    Out[1] = na
    > np.var([1], ddof=1)
    Out[2] = nan
    ~~~
    
    @marmbrus I think we can move the implementation from imperative to declarative in 1.7.
This PR is to re-use the `CentralMomentAgg` for `stddev`. It removes 70 lines of code, which
is a good sign:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message