spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Max Kaznady (JIRA)" <>
Subject [jira] [Commented] (SPARK-3727) DecisionTree, RandomForest: More prediction functionality
Date Mon, 13 Apr 2015 19:19:14 GMT


Max Kaznady commented on SPARK-3727:

Yes, probabilities have to be added to other models too, like LogisticRegression. Right now
they are hardcoded in two places but not outputted in PySpark.

I think is makes sense to split into PySpark, then classification, then probabilities, and
then group different types of algorithms, all of which output probabilities: Logistic Regression,
Random Forest, etc.

Can also add probabilities for trees by counting the number of leaf 1's and 0's.

What do you think?

> DecisionTree, RandomForest: More prediction functionality
> ---------------------------------------------------------
>                 Key: SPARK-3727
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Joseph K. Bradley
> DecisionTree and RandomForest currently predict the most likely label for classification
and the mean for regression.  Other info about predictions would be useful.
> For classification: estimated probability of each possible label
> For regression: variance of estimate
> RandomForest could also create aggregate predictions in multiple ways:
> * Predict mean or median value for regression.
> * Compute variance of estimates (across all trees) for both classification and regression.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message