spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teng Peng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-20077) Documentation for ml.stats.Correlation
Date Mon, 06 Nov 2017 03:16:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-20077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239846#comment-16239846
] 

Teng Peng commented on SPARK-20077:
-----------------------------------

[~srowen] On this pagehttps://spark.apache.org/docs/latest/ml-statistics.html, we have Pearson
and Spearman coefficients. Just want to make sure: Maybe we need something other than this?

Correlation computes the correlation matrix for the input Dataset of Vectors using the specified
method. The output will be a DataFrame that contains the correlation matrix of the column
of vectors.

import org.apache.spark.ml.linalg.{Matrix, Vectors}
import org.apache.spark.ml.stat.Correlation
import org.apache.spark.sql.Row

val data = Seq(
  Vectors.sparse(4, Seq((0, 1.0), (3, -2.0))),
  Vectors.dense(4.0, 5.0, 0.0, 3.0),
  Vectors.dense(6.0, 7.0, 0.0, 8.0),
  Vectors.sparse(4, Seq((0, 9.0), (3, 1.0)))
)

val df = data.map(Tuple1.apply).toDF("features")
val Row(coeff1: Matrix) = Correlation.corr(df, "features").head
println("Pearson correlation matrix:\n" + coeff1.toString)

val Row(coeff2: Matrix) = Correlation.corr(df, "features", "spearman").head
println("Spearman correlation matrix:\n" + coeff2.toString)



> Documentation for ml.stats.Correlation
> --------------------------------------
>
>                 Key: SPARK-20077
>                 URL: https://issues.apache.org/jira/browse/SPARK-20077
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>    Affects Versions: 2.1.0
>            Reporter: Timothy Hunter
>            Priority: Minor
>
> Now that (Pearson) correlations are available in spark.ml, we need to write some documentation
to go along with this feature. It can simply be looking at the unit tests for example right
now.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message