systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "LI Guobao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SYSTEMML-2420) Communication between ps and workers
Date Mon, 25 Jun 2018 23:01:00 GMT

     [ https://issues.apache.org/jira/browse/SYSTEMML-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

LI Guobao updated SYSTEMML-2420:
--------------------------------
    Description: It aims to implement the parameter exchange between ps and workers. We could
leverage netty framework to implement our own Rpc framework. In general, the netty {{TransportClient}}
and {{TransportServer}} provides the sending and receiving service for ps and workers. Extending
the {{RpcHandler}} allows to invoke the corresponding ps method (i.e., push/pull method) by
handling the different input Rpc call object. And then the {{SparkPsProxy}} wrapping {{TransportClient}}
allows the workers to execute the push/pull call to server. At the same time, the ps netty
server also provides the file repository service which allows the workers to download the
partitioned training data, so that the workers could rebuild the matrix object with the transfered
file instead of broadcasting all the files with spark which are not all necessary for each
worker.  (was: It aims to implement the parameter exchange between ps and workers. We could
leverage spark RPC to setup a ps endpoint in driver node which means that the ps service could
be discovered by workers in the network. And then the workers could invoke the pull/push method
via RPC using the registered endpoint of ps service. Hence, in details, this tasks consists
of registering the ps endpoint in spark rpc framework and using rpc to invoke target method
in worker side. We can learn that the spark rpc is implemented in Scala. Hence we need to
wrap them in in order to be used in Java. Overall, we could register the ps service with _RpcEndpoint_
and invoke the service with _RpcEndpointRef_.)

> Communication between ps and workers
> ------------------------------------
>
>                 Key: SYSTEMML-2420
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2420
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: LI Guobao
>            Assignee: LI Guobao
>            Priority: Major
>         Attachments: systemml_rpc_2_seq_diagram.png, systemml_rpc_sequence_diagram.png
>
>
> It aims to implement the parameter exchange between ps and workers. We could leverage
netty framework to implement our own Rpc framework. In general, the netty {{TransportClient}}
and {{TransportServer}} provides the sending and receiving service for ps and workers. Extending
the {{RpcHandler}} allows to invoke the corresponding ps method (i.e., push/pull method) by
handling the different input Rpc call object. And then the {{SparkPsProxy}} wrapping {{TransportClient}}
allows the workers to execute the push/pull call to server. At the same time, the ps netty
server also provides the file repository service which allows the workers to download the
partitioned training data, so that the workers could rebuild the matrix object with the transfered
file instead of broadcasting all the files with spark which are not all necessary for each
worker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message