spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <>
Subject [jira] [Updated] (SPARK-7674) R-like stats for ML models
Date Tue, 30 Jun 2015 02:13:04 GMT


Joseph K. Bradley updated SPARK-7674:
    Shepherd: Joseph K. Bradley

> R-like stats for ML models
> --------------------------
>                 Key: SPARK-7674
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Joseph K. Bradley
>            Assignee: Joseph K. Bradley
>            Priority: Critical
> This is an umbrella JIRA for supporting ML model summaries and statistics, following
the example of R's summary() and plot() functions.
> [Design doc|]
> From the design doc:
> {quote}
> R and its well-established packages provide extensive functionality for inspecting a
model and its results.  This inspection is critical to interpreting, debugging and improving
> R is arguably a gold standard for a statistics/ML library, so this doc largely attempts
to imitate it.  The challenge we face is supporting similar functionality, but on big (distributed)
data.  Data size makes both efficient computation and meaningful displays/summaries difficult.
> R model and result summaries generally take 2 forms:
> * summary(model): Display text with information about the model and results on data
> * plot(model): Display plots about the model and results
> We aim to provide both of these types of information.  Visualization for the plottable
results will not be supported in MLlib itself, but we can provide results in a form which
can be plotted easily with other tools.
> {quote}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message