spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <>
Subject [jira] [Updated] (SPARK-14351) Optimize ImpurityAggregator for decision trees
Date Tue, 14 Jun 2016 04:02:40 GMT


Joseph K. Bradley updated SPARK-14351:
    Priority: Major  (was: Minor)

> Optimize ImpurityAggregator for decision trees
> ----------------------------------------------
>                 Key: SPARK-14351
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Joseph K. Bradley
> {{RandomForest.binsToBestSplit}} currently takes a large amount of time.  Based on some
quick profiling, I believe a big chunk of this is spent in {{ImpurityAggregator.getCalculator}}
(which seems to make unnecessary Array copies) and {{RandomForest.calculateImpurityStats}}.
> This JIRA is for:
> * Doing more profiling to confirm that unnecessary time is being spent in some of these
> * Optimizing the implementation
> * Profiling again to confirm the speedups
> Local profiling for large enough examples should suffice, especially since the optimizations
should not need to change the amount of data communicated.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message