apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandesh Hegde <sand...@datatorrent.com>
Subject Re: APEXCORE-619 Recovery windowId in future during application relaunch.
Date Wed, 01 Mar 2017 20:29:18 GMT
1. Create an empty checkpoint file for the stateless operators.
2. Remove the logic to treat stateless operators as a special case.

Rest of the design remains as is.

On Wed, Mar 1, 2017 at 11:18 AM Amol Kekre <amol@datatorrent.com> wrote:

> The third option should be it.
> 1. On relaunch the DAG should start at commitWindowId
> 2. Pruning of checkpoints should only happen after committedWindowId is
> written by Stram state
>
> Thks
> Amol
>
>
>
> E:amol@datatorrent.com | M: 510-449-2606 <(510)%20449-2606> | Twitter:
> @*amolhkekre*
>
> www.datatorrent.com  |  apex.apache.org
>
> *Join us at Apex Big Data World-San Jose
> <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
> [image: http://www.apexbigdata.com/san-jose-register.html]
> <http://www.apexbigdata.com/san-jose-register.html>
>
> On Wed, Mar 1, 2017 at 5:34 AM, Tushar Gosavi <tushar@apache.org> wrote:
>
> > Help Needed for APEXCORE-619
> >
> > Issue : When application is relaunched after long time with stateless
> > opeartors at the end of the DAG, the stateless operators starts with a
> very
> > high windowId. In this case the stateless operator ignors all the data
> > received till upstream operator catches up with it. This breaks the
> > *at-least-once* gaurantee while relaunch of the opeartor or when master
> is
> > killed and application is restarted.
> >
> > Solutions:
> > - Fix windowId for stateless leaf operators from upstream opeartor. But
> it
> > has some issues when we have a join with two upstrams operators at
> > different windowId. If we set the windowID to min(upstream windowId),
> then
> > we need to again recalulate the new recovery window ids for upstream
> paths
> > from this operators.
> >
> > - Other solution is to create a empty file in checkpoint directory for
> > stateless operators. This will help us to identify the checkpoints of
> > stateless operators during relaunch instead of computing from latest
> > timestamp.
> >
> > - Bring the entire DAG to committedWindowId. This could be achived using
> > writing committedWindowId in a journal. we need to make sure that we are
> > not puring the checkpointed state until the committedWundowId is saved in
> > journal.
> >
> > Let me know your thoughs on this and preferred solution.
> >
> > Regards,
> > -Tushar.
> >
>
-- 
*Join us at Apex Big Data World-San Jose
<http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message