systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "LI Guobao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SYSTEMML-2197) Multi-threaded broadcast creation
Date Thu, 29 Mar 2018 14:48:00 GMT

    [ https://issues.apache.org/jira/browse/SYSTEMML-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419113#comment-16419113
] 

LI Guobao commented on SYSTEMML-2197:
-------------------------------------

Thanks [~mboehm7] for the details. And I want to know which test should be launched for it?
Thanks.

> Multi-threaded broadcast creation
> ---------------------------------
>
>                 Key: SYSTEMML-2197
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2197
>             Project: SystemML
>          Issue Type: Task
>            Reporter: Matthias Boehm
>            Priority: Major
>
> All spark instructions that broadcast one of the input operands, rely on a shared primitive
{{sec.getBroadcastForVariable(var)}} for creating partitioned broadcasts, which are wrapper
objects around potentially many broadcast variables to overcome Spark 2GB limitation for compressed
broadcasts. Each individual broadcast blocks the matrix into squared blocks for direct access
without unnecessary copy per task. So far this broadcast creation is single-threaded. 
> This task aims to parallelize the blocking of the given in-memory matrix into squared
blocks (https://github.com/apache/systemml/blob/master/src/main/java/org/apache/sysml/runtime/instructions/spark/data/PartitionedBlock.java#L82)
as well as the subsequent partition creation and actual broadcasting (https://github.com/apache/systemml/blob/master/src/main/java/org/apache/sysml/runtime/controlprogram/context/SparkExecutionContext.java#L548).

> For consistency and in order to avoid excessive over-provisioning, this multi-threading
should use the common internal thread pool or parallel java streams, which similarly calls
the shared {{ForkJoinPool.commonPool}}. An example is the multi-threaded parallelization of
RDDs which similarly blocks a given matrix into its squared blocks (see https://github.com/apache/systemml/blob/master/src/main/java/org/apache/sysml/runtime/controlprogram/context/SparkExecutionContext.java#L679).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message