systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "LI Guobao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SYSTEMML-2421) Task error and preemption handles
Date Tue, 26 Jun 2018 15:59:00 GMT

     [ https://issues.apache.org/jira/browse/SYSTEMML-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

LI Guobao updated SYSTEMML-2421:
--------------------------------
    Description: It aims to introduce the checkpointing to guarantee that the worker could
recover from previous failure. In details, once a worker is brought up it pulls the current
state of the model. And the checkpointing could be set to be EPOCH10 which means that every
10 epoch the state will be persisted in centralized file on server side.  (was: It aims to
introduce the checkpointing to guarantee that the worker could recover from previous failure.
In details, once a worker is brought up it pulls the current state of the model. And the checkpointing
could be set to be EPOCH10 which means that every 10 epoch the state will be persisted in
a file on worker side.)

> Task error and preemption handles
> ---------------------------------
>
>                 Key: SYSTEMML-2421
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2421
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: LI Guobao
>            Assignee: LI Guobao
>            Priority: Major
>
> It aims to introduce the checkpointing to guarantee that the worker could recover from
previous failure. In details, once a worker is brought up it pulls the current state of the
model. And the checkpointing could be set to be EPOCH10 which means that every 10 epoch the
state will be persisted in centralized file on server side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message