apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amol Kekre <a...@datatorrent.com>
Subject Re: APEXCORE-619 Recovery windowId in future during application relaunch.
Date Wed, 01 Mar 2017 19:18:51 GMT
The third option should be it.
1. On relaunch the DAG should start at commitWindowId
2. Pruning of checkpoints should only happen after committedWindowId is
written by Stram state

Thks
Amol



E:amol@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com  |  apex.apache.org

*Join us at Apex Big Data World-San Jose
<http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]
<http://www.apexbigdata.com/san-jose-register.html>

On Wed, Mar 1, 2017 at 5:34 AM, Tushar Gosavi <tushar@apache.org> wrote:

> Help Needed for APEXCORE-619
>
> Issue : When application is relaunched after long time with stateless
> opeartors at the end of the DAG, the stateless operators starts with a very
> high windowId. In this case the stateless operator ignors all the data
> received till upstream operator catches up with it. This breaks the
> *at-least-once* gaurantee while relaunch of the opeartor or when master is
> killed and application is restarted.
>
> Solutions:
> - Fix windowId for stateless leaf operators from upstream opeartor. But it
> has some issues when we have a join with two upstrams operators at
> different windowId. If we set the windowID to min(upstream windowId), then
> we need to again recalulate the new recovery window ids for upstream paths
> from this operators.
>
> - Other solution is to create a empty file in checkpoint directory for
> stateless operators. This will help us to identify the checkpoints of
> stateless operators during relaunch instead of computing from latest
> timestamp.
>
> - Bring the entire DAG to committedWindowId. This could be achived using
> writing committedWindowId in a journal. we need to make sure that we are
> not puring the checkpointed state until the committedWundowId is saved in
> journal.
>
> Let me know your thoughs on this and preferred solution.
>
> Regards,
> -Tushar.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message