gearpump-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manu Zhang (JIRA)" <>
Subject [jira] [Created] (GEARPUMP-32) Minimum clock of source Tasks maybe inaccurate
Date Fri, 15 Apr 2016 04:15:25 GMT
Manu Zhang created GEARPUMP-32:

             Summary: Minimum clock of source Tasks maybe inaccurate
                 Key: GEARPUMP-32
             Project: Apache Gearpump
          Issue Type: Bug
          Components: streaming
    Affects Versions: 0.8.0
            Reporter: Manu Zhang
            Assignee: Manu Zhang

Moved from [] and reported by [Zhu Yueqian|]

Source tasks have not any upstreamClocks. So, startClock is the minimum of pending clocks
when recover happen.

eg below:
source task1: timeStamp:15,not ACK, minClockValue maybe is 15(<= 15).
source task2: timeStamp:10,ACKed, minClockValue maybe is Long.MaxValue
when recover happen,startClock maybe is 15. where is the data between 10 to 15 at task2?

More context on this issue:

In Gearpump, we maintain a global minimum clock tracked from a message's timestamp across
all tasks. It means messages with timestamp before this clock have all been processed. An
application will restart from this value on failure, and thus at-least-once message delivery
could be guaranteed. 

The global minimum clock is the lower bound of all the Tasks' minimum clocks. 
For a task, the minimum clock is the lower of 

  1. upstream minimum clock
  2. a. the minimum timestamp of unacked messages
      b. Long.MaxValue if all messages have been acked.
Note that 2.b allows the global minimum clock to progress and it is almost safe since the
clock is also bounded by the upstream minimum clock. I said "almost safe" because a source
task has no upstream but we assume the upstream minimum clock is Long.MaxValue. Thus, the
scenario described by Zhu Yueqian could happen and breaks at-least-once guarantee. 

This message was sent by Atlassian JIRA

View raw message