apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandesh Hegde <sand...@datatorrent.com>
Subject Re: Improving Apex relaunch time.
Date Wed, 21 Sep 2016 14:46:50 GMT
Relaunching from the same location can be one of the options.

On Tue, Sep 20, 2016, 10:17 PM Tushar Gosavi <tushar@datatorrent.com> wrote:

> In case of application failure, we will like to have ability to
> quickly restart the application while keeping the old state for
> failure
> analysis. Also the problem remains the same when we want to start from
> savepoint, where we will need to copy state from
> savepoint to application.
>
> -Tushar.
>
>
>
> On Tue, Sep 20, 2016 at 8:34 PM, Sandesh Hegde <sandesh@datatorrent.com>
> wrote:
> > How about re-launching the app from the same location?
> >
> > If at all they want to store the state we can provide savepoint feature.
> >
> > On Tue, Sep 20, 2016 at 4:39 AM Tushar Gosavi <tushar@datatorrent.com>
> > wrote:
> >
> >> We have observed that application relaunch takes long time.
> >> The one major reason for delay in application startup during relaunch
> >> is time taken to copy state of exisitng application to new application.
> >> This state could grow in GBs and copy is performed in single thread
> before
> >> new application is submitted to Yarn.
> >>
> >> The state of previous application constists
> >> - jars
> >> - stram checkpoint/recovery file.
> >> - events
> >> - container file
> >> - stats recording if enabled.
> >> - operator checkpoints
> >> - operator data.
> >>
> >> We could avoid copying debugging data like stat recording which could
> >> run in TB for long
> >> running application and is not required for functioning of new
> application.
> >>
> >> Similarly operator checkpoints could be read in parallel when they are
> >> launched for first time,
> >> This will also help in copying only required checkpoints and will be
> >> done in parallel
> >> by multiple containers/threads.
> >>
> >> For operator data stored in application directory, we could copy it
> >> completely for now, but
> >> in future we could provide an callback which will allow operator
> >> partition to read only
> >> required state from previous location.
> >>
> >> let me know your though on this.
> >>
> >> Regards,
> >> - Tushar.
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message