incubator-s4-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dingyu Yang <>
Subject Fault tolerance and communication
Date Thu, 21 Mar 2013 03:06:32 GMT
I test the section of fault tolerance, but can not recover the state of
failed node:
I have a adapter and one app node, one stand-by node. The checkpoint is
doing with the baseconfig of 20 seconds.
When app node is stop, the stand-by node can acquire a task, but the state
is not recovered.
You  can check or i have to do some other configs.

Another problem is that the communication between adapter and app.
I test the experiment of word count, a 500M file with 80775764 words.
multiple nodes for app partitions, one node for adapter.
I test one adatper node and one app node, the adapter sending all the words
is done with 35 seconds.
one adatper node and two app node, the adapter is done with 61 seconds.
one adatper node and three app node, the adapter is done with 95 seconds.

The adapter node is a same node and same program.
The time of adapter should be same or less with increasing app nodes, since
its processing ability has increased.
I don't know what the problem is.

Thank you!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message