Mailing-List: contact user-help@spark.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: 
 <CAEEhGzCR3MqTREXop+4-zgriavei-1p_rvgkNJsxocMm3nLQuQ@mail.gmail.com>
References: 
 <CAEEhGzDXNP5edb7TBBW8bXpmAi5s8X2o5x=9CzRf87w8-O3bfw@mail.gmail.com>
	<CABOhaRSLUr_Xcbfs3eLSC0pzVt4YSFygTndUdsHK5LM4T0ZQZw@mail.gmail.com>
	<CAEEhGzCR3MqTREXop+4-zgriavei-1p_rvgkNJsxocMm3nLQuQ@mail.gmail.com>
Date: Tue, 27 Oct 2015 10:30:49 -0500
Message-ID: 
 <CAKWX9VWV9pUmFu_fv1kVoiLGMSJFvHWdd0FV9LV34VpAt31QnA@mail.gmail.com>
Subject: Re: correct and fast way to stop streaming application
From: Cody Koeninger <cody@koeninger.org>
To: Krot Viacheslav <krot.vyacheslav@gmail.com>
Cc: varun sharma <varunsharmansit@gmail.com>,
	"user@spark.apache.org" <user@spark.apache.org>
Content-Type: multipart/alternative; boundary=001a113d2dfa30dd4d052317c390

--001a113d2dfa30dd4d052317c390
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

If you want to make sure that your offsets are increasing without gaps...
one way to do that is to enforce that invariant when you're saving to your
database.  That would probably mean using a real database instead of
zookeeper though.

On Tue, Oct 27, 2015 at 4:13 AM, Krot Viacheslav <krot.vyacheslav@gmail.com=
>
wrote:

> Any ideas? This is so important because we use kafka direct streaming and
> save processed offsets manually as last step in the job, so we archive
> at-least-once.
> But see what happens when new batch is scheduled after a job fails:
> - suppose we start from offset 10 loaded from zookeeper
> - job starts with offsets 10-20
> - job fails N times, awaitTermination notices that and stops context (or
> even jvm with System.exit), but Scheduler has already started new job, it
> is job for offsets 20-30, and sent it to executor.
> - executor does all the steps (if there is only one stage) and saves
> offset 30 to zookeeper.
>
> This way I loose data in offsets 10-20
>
> How should this be handled correctly?
>
> =D0=BF=D0=BD, 26 =D0=BE=D0=BA=D1=82. 2015 =D0=B3. =D0=B2 18:37, varun sha=
rma <varunsharmansit@gmail.com>:
>
>> +1, wanted to do same.
>>
>> On Mon, Oct 26, 2015 at 8:58 PM, Krot Viacheslav <
>> krot.vyacheslav@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I wonder what is the correct way to stop streaming application if some
>>> job failed?
>>> What I have now:
>>>
>>> val ssc =3D new StreamingContext
>>> ....
>>> ssc.start()
>>> try {
>>>    ssc.awaitTermination()
>>> } catch {
>>>    case e =3D> ssc.stop(stopSparkContext =3D true, stopGracefully =3D f=
alse)
>>> }
>>>
>>> It works but one problem still exists - after job failed and before
>>> streaming context is stopped it manages to start job for next batch. Th=
at
>>> is not desirable for me.
>>> It works like this because JobScheduler is an actor and after it report=
s
>>> error, it goes on with next message that starts next batch job. While
>>> ssc.awaitTermination() works in another thread and happens after next b=
atch
>>> starts.
>>>
>>> Is there a way to stop before next job is submitted?
>>>
>>
>>
>>
>> --
>> *VARUN SHARMA*
>> *Flipkart*
>> *Bangalore*
>>
>

--001a113d2dfa30dd4d052317c390
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">If you want to make sure that your offsets are increasing =
without gaps... one way to do that is to enforce that invariant when you=
9;re saving to your database.=C2=A0 That would probably mean using a real d=
atabase instead of zookeeper though.</div><div class=3D"gmail_extra"><br><d=
iv class=3D"gmail_quote">On Tue, Oct 27, 2015 at 4:13 AM, Krot Viacheslav <=
span dir=3D"ltr">&lt;<a href=3D"mailto:krot.vyacheslav@gmail.com" target=3D=
"_blank">krot.vyacheslav@gmail.com</a>&gt;</span> wrote:<br><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pa=
dding-left:1ex"><div dir=3D"ltr"><div><div><div><div><div><div>Any ideas? T=
his is so important because we use kafka direct streaming and save processe=
d offsets manually as last step in the job, so we archive at-least-once. <b=
r></div>But see what happens when new batch is scheduled after a job fails:=
<br></div><div>- suppose we start from offset 10 loaded from zookeeper<br><=
/div>- job starts with offsets 10-20<br></div>- job fails N times, awaitTer=
mination notices that and stops context (or even jvm with System.exit), but=
 Scheduler has already started new job, it is job for offsets 20-30, and se=
nt it to executor.<br></div>- executor does all the steps (if there is only=
 one stage) and saves offset 30 to zookeeper. <br><br></div>This way I loos=
e data in offsets 10-20<br></div><br>How should this be handled correctly?<=
br></div><br><div class=3D"gmail_quote"><div dir=3D"ltr">=D0=BF=D0=BD, 26 =
=D0=BE=D0=BA=D1=82. 2015 =D0=B3. =D0=B2 18:37, varun sharma &lt;<a href=3D"=
mailto:varunsharmansit@gmail.com" target=3D"_blank">varunsharmansit@gmail.c=
om</a>&gt;:<br></div><div><div class=3D"h5"><blockquote class=3D"gmail_quot=
e" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">=
<div dir=3D"ltr">+1, wanted to do same.</div><div class=3D"gmail_extra"></d=
iv><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Mon, Oct 26=
, 2015 at 8:58 PM, Krot Viacheslav <span dir=3D"ltr">&lt;<a href=3D"mailto:=
krot.vyacheslav@gmail.com" target=3D"_blank">krot.vyacheslav@gmail.com</a>&=
gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 =
0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><=
div><div><div><div><div><div>Hi all,<br><br></div>I wonder what is the corr=
ect way to stop streaming application if some job failed?<br></div>What I h=
ave now:<br><br></div>val ssc =3D new StreamingContext=20
<br>....

<br>ssc.start()
<br>try {
  <br>=C2=A0=C2=A0 ssc.awaitTermination()
<br>} catch {
  <br>=C2=A0=C2=A0 case e =3D&gt; ssc.stop(stopSparkContext =3D true, stopG=
racefully =3D false)
<br>}<br><br></div>It works but one problem still exists - after job failed=
 and before streaming context is stopped it manages to start job for next b=
atch. That is not desirable for me. <br></div>It works like this because Jo=
bScheduler is an actor and after it reports error, it goes on with next mes=
sage that starts next batch job. While ssc.awaitTermination() works in anot=
her thread and happens after next batch starts. <br></div></div><br>Is ther=
e a way to stop before next job is submitted?<br></div>
</blockquote></div><br><br clear=3D"all"><div><br></div></div><div class=3D=
"gmail_extra">-- <br><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><b><i=
>VARUN SHARMA</i></b><i><br></i></div><div><i>Flipkart</i><br></div><i>Bang=
alore</i><br></div></div></div></div>
</div></blockquote></div></div></div>
</blockquote></div><br></div>

--001a113d2dfa30dd4d052317c390--