spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wenmin Wu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-10629) Gradient boosted trees: mapPartitions input size increasing
Date Wed, 16 Sep 2015 02:26:45 GMT

     [ https://issues.apache.org/jira/browse/SPARK-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wenmin Wu updated SPARK-10629:
------------------------------
    Description: 
First of all, I think my problem is quite different from https://issues.apache.org/jira/browse/SPARK-10433,
which point that the input size increasing at each iteration.

My problem is the mapPartitions input size increase in one iteration. My training samples
has 2958359 features in total. Within one iteration, 3 collectAsMap operation had been called.
And here is a summary of each call.

| Tables        | Are           | Cool  |
| ------------- |:-------------:| -----:|
| col 3 is      | right-aligned | $1600 |
| col 2 is      | centered      |   $12 |
| zebra stripes | are neat      |    $1 |

  was:
First of all, I think my problem is quite different from https://issues.apache.org/jira/browse/SPARK-10433,
which point that the input size increasing at each iteration.

My problem is the mapPartitions input size increase in one iteration. My training samples
has 2958359 features in total. Within one iteration, 3 collectAsMap operation had been called.
And here is a summary of each call.

stage ID 4 mapPartitions at DecisionTree.scala:613 


> Gradient boosted trees: mapPartitions input size increasing 
> ------------------------------------------------------------
>
>                 Key: SPARK-10629
>                 URL: https://issues.apache.org/jira/browse/SPARK-10629
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.4.1
>            Reporter: Wenmin Wu
>
> First of all, I think my problem is quite different from https://issues.apache.org/jira/browse/SPARK-10433,
which point that the input size increasing at each iteration.
> My problem is the mapPartitions input size increase in one iteration. My training samples
has 2958359 features in total. Within one iteration, 3 collectAsMap operation had been called.
And here is a summary of each call.
> | Tables        | Are           | Cool  |
> | ------------- |:-------------:| -----:|
> | col 3 is      | right-aligned | $1600 |
> | col 2 is      | centered      |   $12 |
> | zebra stripes | are neat      |    $1 |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message