systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Boehm (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SYSTEMML-2336) Data partition
Date Sun, 20 May 2018 23:55:00 GMT

    [ https://issues.apache.org/jira/browse/SYSTEMML-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482086#comment-16482086
] 

Matthias Boehm edited comment on SYSTEMML-2336 at 5/20/18 11:54 PM:
--------------------------------------------------------------------

I would recommend to simply leverage existing operations. Similar to our approach for constant
folding, you can temporarily construct hops and execute the generated instructions to perform
the data partitioning. In detail, here is how I would map the different schemes:
* Disjoint_Contiguous: for each worker, use a right indexing operation {{X[beg:end,]}} to
obtain contiguous, non-overlapping partitions of rows.
* Disjoint_Round_Robin: for each worker, use a permutation multiply or simpler a removeEmpty
such as {{removeEmpty(target=X, margin=rows, select=(seq(1,nrow(X))%%k)==id)}}.
* Disjoint_Random: for each worker, use a permutation multiply {{P[beg:end,] %*% X}}, where
P is constructed for example with {{P=table(seq(1,nrow(X),sample(nrow(X), nrow(X))))}}, i.e.,
sampling without replacement to ensure disjointness.
* Overlap_Reshuffle: Similar to the above, except you create a new permutation matrix for
each worker and without the indexing on P.

It's probably a good idea to start simple. Hence, I would recommend to implement disjoint_contiguous
first, and get a basic local parameter server running. Once, this is done, we can come back
to the other data partitioning schemes.


was (Author: mboehm7):
I would recommend to simply leverage existing operations. Similar to our approach for constant
folding, you can temporarily constructs hops and execute the generated instructions to perform
the data partitioning. In detail, here is how I would map the different schemes:
* Disjoint_Contiguous: for each worker, use a right indexing operation {{X[beg:end,]}} to
obtain contiguous, non-overlapping partitions of rows.
* Disjoint_Round_Robin: for each worker, use a permutation multiply or simpler a removeEmpty
such as {{removeEmpty(target=X, margin=rows, select=(seq(1,nrow(X))%%k)==id)}}.
* Disjoint_Random: for each worker, use a permutation multiply {{P[beg:end,] %*% X}}, where
P is constructed for example with {{P=table(seq(1,nrow(X),sample(nrow(X), nrow(X))))}}, i.e.,
sampling without replacement to ensure disjointness.
* Overlap_Reshuffle: Similar to the above, except you create a new permutation matrix for
each worker and without the indexing on P.

It's probably a good idea to start simple. Hence, I would recommend to implement disjoint_contiguous
first, and get a basic local parameter server running. Once, this is done, we can come back
to the other data partitioning schemes.

> Data partition
> --------------
>
>                 Key: SYSTEMML-2336
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2336
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: LI Guobao
>            Assignee: LI Guobao
>            Priority: Major
>
> It aims to implement the four different schemes (i.e., disjoint_contiguous, disjoint_round_robin,
disjoint_random, overlap_reshuffle) of data partition for paramserv builtin function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message