spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gang Bai (JIRA)" <>
Subject [jira] [Commented] (SPARK-2303) Poisson regression model for count data
Date Tue, 08 Jul 2014 03:28:34 GMT


Gang Bai commented on SPARK-2303:

This change has been merged into another JIRA SPARK-2311. Closing this one.

> Poisson regression model for count data
> ---------------------------------------
>                 Key: SPARK-2303
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Gang Bai
> Modeling count data is of great importance in solving real-world statistic problems.
Currently mllib.regression provides models mostly for numeric data, i.e fitting curves with
various regularization on resulted weights, but still lacks the support of count data modeling.
> A very basic model for this is the Poisson regression. Following the patterns in mllib
and reusing the components, we address the parameter estimation for Poisson regression in
a maximum likelihood manner. In detail, to add Poisson regression to mllib.regression, we
need to:
>  # Add the gradient of the negative log-likelihood into mllib/optimization/Gradients.scala.
>  # Add the implementations of PoissonRegressionModel, which extends GeneralizedLinearModel
with RegressionModel. Here we need the implementation of the predict method.
>  # Add the implementations of the generalized linear algorithm class. Here we can use
either LBFGS or GradientDescent as the optimizer. So we implement both as class PoissonRegressionWithSGD
and class PoissonRegressionWithLBFGS respectively.
>  # Add the companion object PoissonRegressionWithSGD and PoissonRegressionWithLBFGS as
>  # Test suites
>  ## Test the gradient computation.
>  ## Test the regression method using generated data, which requires a PoissonRegressionDataGenerator.
>  ## Test the regression method using a real-world data set.
>  # Add the documents.

This message was sent by Atlassian JIRA

View raw message