apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amol Kekre <a...@datatorrent.com>
Subject Re: APEXCORE-619 Recovery windowId in future during application relaunch.
Date Wed, 01 Mar 2017 19:18:51 GMT
The third option should be it.
1. On relaunch the DAG should start at commitWindowId
2. Pruning of checkpoints should only happen after committedWindowId is
written by Stram state


E:amol@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com  |  apex.apache.org

*Join us at Apex Big Data World-San Jose
<http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]

On Wed, Mar 1, 2017 at 5:34 AM, Tushar Gosavi <tushar@apache.org> wrote:

> Help Needed for APEXCORE-619
> Issue : When application is relaunched after long time with stateless
> opeartors at the end of the DAG, the stateless operators starts with a very
> high windowId. In this case the stateless operator ignors all the data
> received till upstream operator catches up with it. This breaks the
> *at-least-once* gaurantee while relaunch of the opeartor or when master is
> killed and application is restarted.
> Solutions:
> - Fix windowId for stateless leaf operators from upstream opeartor. But it
> has some issues when we have a join with two upstrams operators at
> different windowId. If we set the windowID to min(upstream windowId), then
> we need to again recalulate the new recovery window ids for upstream paths
> from this operators.
> - Other solution is to create a empty file in checkpoint directory for
> stateless operators. This will help us to identify the checkpoints of
> stateless operators during relaunch instead of computing from latest
> timestamp.
> - Bring the entire DAG to committedWindowId. This could be achived using
> writing committedWindowId in a journal. we need to make sure that we are
> not puring the checkpointed state until the committedWundowId is saved in
> journal.
> Let me know your thoughs on this and preferred solution.
> Regards,
> -Tushar.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message