spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Wendell (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-1212) Support sparse data in MLlib
Date Thu, 03 Apr 2014 07:23:14 GMT

     [ https://issues.apache.org/jira/browse/SPARK-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Patrick Wendell resolved SPARK-1212.
------------------------------------

    Resolution: Fixed

> Support sparse data in MLlib
> ----------------------------
>
>                 Key: SPARK-1212
>                 URL: https://issues.apache.org/jira/browse/SPARK-1212
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 0.9.0
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> MLlib's NaiveBayes, SGD, and KMeans accept RDD[LabeledPoint] for training and RDD[Array[Double]]
for prediction, where LabeledPoint is a wrapper of (Double, Array[Double]). Using Array[Double]
could have good performance, but sparse data appears quite often in practice. So I created
this JIRA to discuss the plan of adding sparse data support to MLlib and track its progress.
> The goal is to support sparse data for training and prediction in all existing algorithms
in MLlib:
> * Gradient Descent
> * K-Means
> * Naive Bayes
> Previous discussions and pull requests:
> * https://github.com/mesos/spark/pull/736



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message