gearpump-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Zhong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GEARPUMP-8) Two machines can possibly have same worker Id when master restart in single-master cluster
Date Tue, 05 Apr 2016 03:18:25 GMT

    [ https://issues.apache.org/jira/browse/GEARPUMP-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15225590#comment-15225590
] 

Sean Zhong commented on GEARPUMP-8:
-----------------------------------

fixed by https://github.com/gearpump/gearpump/pull/2028

> Two machines can possibly have same worker Id when master restart in single-master cluster
> ------------------------------------------------------------------------------------------
>
>                 Key: GEARPUMP-8
>                 URL: https://issues.apache.org/jira/browse/GEARPUMP-8
>             Project: Apache Gearpump
>          Issue Type: Bug
>            Reporter: Sean Zhong
>            Assignee: Sean Zhong
>
> *Why we should NOT allow duplicate worker id?*
> We use worker Id to track the resource of single machine. If two machines have same worker
id, then it would create a lot of confusion.
> *Pre-condition to trigger this issue?*
> This happens when the cluster only has one master, and the master is doing restart. 
> If the cluster have multiple masters, then it is not impacted by this issue.
> *How this issue happens?*
> When master is going through restart, since there is no other master machines for HA,
 the master status is lost, including the worker id list that has been occupied by existing
workers. Then when a new worker machine joins, it would get a fresh worker Id starting from
0, which could possibly conflict with existing worker machines.
> *Suggested fix?*
> Instead of using sequence 0, 1, 2, 3, 4... for worker id, we append a timestamp, which
is the time that worker register itself to master.
> Like this:
> {quote}
> WorkerId(0, timestamp1)
> WorkerId(1, timestamp2)
> ...
> {quote}
> Then when master is restarted, the new worker and old worker can be differentiated by
the timestamp, as the time of registration is different. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message