spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-21968) Improved KernelDensity support
Date Sun, 10 Sep 2017 18:49:00 GMT
Brian created SPARK-21968:
-----------------------------

             Summary: Improved KernelDensity support
                 Key: SPARK-21968
                 URL: https://issues.apache.org/jira/browse/SPARK-21968
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
    Affects Versions: 2.2.0
            Reporter: Brian


Related to SPARK-7753.  The KernelDensity API still does not provide a way to specify a kernel
as described in the 7753 ticket, and requires the client to calculate their own optimal bandwidth.

Specifying a kernel could be something like:
def
setKernel(kernel: Function2[Double,Double]): KernelDensity.this.type

There could be something providing the user with a few options for kernels they could pass
here so they don't need to implement each kernel themselves. Here are some example kernels:
https://en.wikipedia.org/wiki/Kernel_(statistics)#Kernel_functions_in_common_use

functions could also be provided to get more optimal bandwidth settings without the user needing
to calculate it themselves, e.g. the "rule of thumb" and/or "solve the equation" bandwidth
described here:
https://en.wikipedia.org/wiki/Kernel_density_estimation#Bandwidth_selection




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message