systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "LI Guobao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SYSTEMML-2418) Spark data partitioner
Date Tue, 26 Jun 2018 23:06:00 GMT

     [ https://issues.apache.org/jira/browse/SYSTEMML-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

LI Guobao updated SYSTEMML-2418:
--------------------------------
    Summary: Spark data partitioner  (was: Distributing data to workers)

> Spark data partitioner
> ----------------------
>
>                 Key: SYSTEMML-2418
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2418
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: LI Guobao
>            Assignee: LI Guobao
>            Priority: Major
>
> In the context of ps, the training data will be partitioned according to the different
schemes. This conversion is executed in driver node and the partitioned data should be distributed
to workers via broadcast. Due to the 2G limitation of spark broadcast, we could leverage the
_PartitionedBroadcast_ class to do this conversion. Afterwards, the partitioned broadcast
object can be passed to workers for launching its job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message