gearpump-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (GEARPUMP-32) Minimum clock of source Tasks maybe inaccurate
Date Fri, 03 Jun 2016 03:47:59 GMT


ASF GitHub Bot commented on GEARPUMP-32:

GitHub user manuzhang opened a pull request:

    fix GEARPUMP-32, introduce source watermark

    This PR introduces source watermark such that 
    1. messages with timestamps earlier than watermark could be filtered out at source.
    2. minclock of source processor will be bounded by watemark

You can merge this pull request into a Git repository by running:

    $ git pull watermark

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #33
commit d3d6466a10a0475a9e83dd07d20e9b4d6c1bdb9e
Author: manuzhang <>
Date:   2016-05-31T05:10:24Z

    fix GEARPUMP-32, introduce source watermark


> Minimum clock of source Tasks maybe inaccurate
> ----------------------------------------------
>                 Key: GEARPUMP-32
>                 URL:
>             Project: Apache Gearpump
>          Issue Type: Bug
>          Components: streaming
>    Affects Versions: 0.8.0
>            Reporter: Manu Zhang
>            Assignee: Manu Zhang
>             Fix For: 0.8.1
> Moved from [] and reported by [Zhu Yueqian|]
> {quote}
> Source tasks have not any upstreamClocks. So, startClock is the minimum of pending clocks
when recover happen.
> eg below:
> source task1: timeStamp:15,not ACK, minClockValue maybe is 15(<= 15).
> source task2: timeStamp:10,ACKed, minClockValue maybe is Long.MaxValue
> when recover happen,startClock maybe is 15. where is the data between 10 to 15 at task2?
> {quote}
> More context on this issue:
> In Gearpump, we maintain a global minimum clock tracked from a message's timestamp across
all tasks. It means messages with timestamp before this clock have all been processed. An
application will restart from this value on failure, and thus at-least-once message delivery
could be guaranteed. 
> The global minimum clock is the lower bound of all the Tasks' minimum clocks. 
> For a task, the minimum clock is the lower of 
> # upstream minimum clock
> # a. the minimum timestamp of unacked messages
>    b. Long.MaxValue if all messages have been acked.
> Note that 2.b allows the global minimum clock to progress and it is almost safe since
the clock is also bounded by the upstream minimum clock. I said "almost safe" because a source
task has no upstream but we assume the upstream minimum clock is Long.MaxValue. Thus, the
scenario described by Zhu Yueqian could happen and breaks at-least-once guarantee. 

This message was sent by Atlassian JIRA

View raw message