giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei Yan <>
Subject Have a question regarding restart from last checkpoint
Date Wed, 30 Apr 2014 03:32:33 GMT
Hi, guys,

I have a question regarding how Giraph restarts from last checkpoint due to

I run an example with 5 workers and 1 master. Two workers are preempted
during running. But I found the other 3 workers also quit. I check the
code, and find the following in the
BspServiceWorker.processEvent(WatchedEvent event):

if ((ApplicationState.valueOf(jsonObj.getString(JSONOBJ_STATE_KEY)) ==
    ApplicationState.START_SUPERSTEP) &&
    getApplicationAttempt()) {
        LOG.fatal("processEvent: Worker will restart " +
            "from command - " + jsonObj.toString());

Does this mean all ''good'' workers also need to quit and the job needs to
request resources again? BTW, I use the pure-YARN with

The following is the log from one "good" worker:

2014-04-29 21:56:55,284 INFO  [main-EventThread] worker.BspServiceWorker
( - processEvent: Job state
changed, checking to see if it needs to restart
2014-04-29 21:56:55,285 INFO  [main-EventThread] bsp.BspService
( - getJobState: Job state already exists
2014-04-29 21:56:55,287 FATAL [main-EventThread] worker.BspServiceWorker
( - processEvent: Worker will
restart from command -

Thanks for help!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message