systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "LI Guobao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SYSTEMML-2418) Spark data partitioner
Date Thu, 28 Jun 2018 13:31:00 GMT

     [ https://issues.apache.org/jira/browse/SYSTEMML-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

LI Guobao updated SYSTEMML-2418:
--------------------------------
    Description: In the context of ml, it would be more efficient to support the data partitioning
in distributed manner. This task aims to do the data partitioning on Spark which means that
all the data will be firstly splitted among workers and then execute data partitioning on
worker side according to scheme, and then the partitioned data which stay on each worker could
be directly passed to run model training work.  (was: In the context of ml, the training data
will be usually overfitted in spark driver node. So to partition such enormous data is no
more feasible in CP. This task aims to do the data partitioning in distributed way which means
that the workers will receive its split of training data and do the data partition locally
according to different schemes. And then all the data will be grouped by the given key (i.e.,
the worker id) and at last be written into the seperate HDFS file in scratch place.)

> Spark data partitioner
> ----------------------
>
>                 Key: SYSTEMML-2418
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2418
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: LI Guobao
>            Assignee: LI Guobao
>            Priority: Major
>
> In the context of ml, it would be more efficient to support the data partitioning in
distributed manner. This task aims to do the data partitioning on Spark which means that all
the data will be firstly splitted among workers and then execute data partitioning on worker
side according to scheme, and then the partitioned data which stay on each worker could be
directly passed to run model training work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message