[ https://issues.apache.org/jira/browse/SYSTEMML2336?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=16482086#comment16482086
]
Matthias Boehm edited comment on SYSTEMML2336 at 5/20/18 11:54 PM:

I would recommend to simply leverage existing operations. Similar to our approach for constant
folding, you can temporarily construct hops and execute the generated instructions to perform
the data partitioning. In detail, here is how I would map the different schemes:
* Disjoint_Contiguous: for each worker, use a right indexing operation {{X[beg:end,]}} to
obtain contiguous, nonoverlapping partitions of rows.
* Disjoint_Round_Robin: for each worker, use a permutation multiply or simpler a removeEmpty
such as {{removeEmpty(target=X, margin=rows, select=(seq(1,nrow(X))%%k)==id)}}.
* Disjoint_Random: for each worker, use a permutation multiply {{P[beg:end,] %*% X}}, where
P is constructed for example with {{P=table(seq(1,nrow(X),sample(nrow(X), nrow(X))))}}, i.e.,
sampling without replacement to ensure disjointness.
* Overlap_Reshuffle: Similar to the above, except you create a new permutation matrix for
each worker and without the indexing on P.
It's probably a good idea to start simple. Hence, I would recommend to implement disjoint_contiguous
first, and get a basic local parameter server running. Once, this is done, we can come back
to the other data partitioning schemes.
was (Author: mboehm7):
I would recommend to simply leverage existing operations. Similar to our approach for constant
folding, you can temporarily constructs hops and execute the generated instructions to perform
the data partitioning. In detail, here is how I would map the different schemes:
* Disjoint_Contiguous: for each worker, use a right indexing operation {{X[beg:end,]}} to
obtain contiguous, nonoverlapping partitions of rows.
* Disjoint_Round_Robin: for each worker, use a permutation multiply or simpler a removeEmpty
such as {{removeEmpty(target=X, margin=rows, select=(seq(1,nrow(X))%%k)==id)}}.
* Disjoint_Random: for each worker, use a permutation multiply {{P[beg:end,] %*% X}}, where
P is constructed for example with {{P=table(seq(1,nrow(X),sample(nrow(X), nrow(X))))}}, i.e.,
sampling without replacement to ensure disjointness.
* Overlap_Reshuffle: Similar to the above, except you create a new permutation matrix for
each worker and without the indexing on P.
It's probably a good idea to start simple. Hence, I would recommend to implement disjoint_contiguous
first, and get a basic local parameter server running. Once, this is done, we can come back
to the other data partitioning schemes.
> Data partition
> 
>
> Key: SYSTEMML2336
> URL: https://issues.apache.org/jira/browse/SYSTEMML2336
> Project: SystemML
> Issue Type: Subtask
> Reporter: LI Guobao
> Assignee: LI Guobao
> Priority: Major
>
> It aims to implement the four different schemes (i.e., disjoint_contiguous, disjoint_round_robin,
disjoint_random, overlap_reshuffle) of data partition for paramserv builtin function.

This message was sent by Atlassian JIRA
(v7.6.3#76005)
