spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-19189) Optimize CartesianRDD to avoid parent RDD partition re-computation and re-serialization
Date Fri, 13 Jan 2017 12:42:26 GMT

     [ https://issues.apache.org/jira/browse/SPARK-19189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-19189:
------------------------------------

    Assignee:     (was: Apache Spark)

> Optimize CartesianRDD to avoid parent RDD partition re-computation and re-serialization
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-19189
>                 URL: https://issues.apache.org/jira/browse/SPARK-19189
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: Weichen Xu
>            Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Current CartesianRDD implementation, suppose RDDA cartisian RDDB, generating RDDC,
> each RDDA partition will be reading by multiple RDDC partition, and RDDB has similar
problem.
> This will cause, when RDDC partition computing, each partition's data in RDDA or RDDB
will be repeatedly serialized (then transfer through network), if RDDA or RDDB haven't been
persist, it will cause RDD recomputation repeatedly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message