Mailing-List: contact dev-help@spark.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
References: <CAH7vnfjQqUnk__3tx+_jjM6wOJkmrbZOh8xSTFmcPsgr5soueA@mail.gmail.com>
 <CAAswR-4NEO_WbtQqb_yqL5wLQvftDa8wYj1YBfcoQYygXntL_w@mail.gmail.com>
 <CAKWX9VVj95C9+q6z+hwQMp5Z6LjsH17C3qiPE5jWxDzxSzQ7jg@mail.gmail.com>
 <CAAswR-7hzkFXr8trXSvNJMh7nBdg9=r=O6cwQ24avBY-dpagTQ@mail.gmail.com>
 <CAKx7Bf9dBNzqHHVK6+3-y8cBCPzOV4_djUEkJKnc4AqGNHVQWQ@mail.gmail.com>
 <CABqdQ5ZVP1EJPErbJ1w7pDTcEj-NFOjmgJyGDjRASJcnPTMRvA@mail.gmail.com>
 <BB658288-D1B5-4A69-BFE7-15B0C2EB8A1B@gmail.com> <CAKWX9VVOAQ057n7ejPEFhJvX6uiVax+yiaH-mXbpYP=59+U4Qg@mail.gmail.com>
 <0072CAD8-3BAD-4AE9-91B9-5A6C18AED293@gmail.com>
In-Reply-To: <0072CAD8-3BAD-4AE9-91B9-5A6C18AED293@gmail.com>
From: Amit Sela <amitsela33@gmail.com>
Date: Thu, 20 Oct 2016 10:30:17 +0000
Message-ID: <CABqdQ5YfXA78bfkW6+UAm0MrzQWZeoE0hXf5fNyW6YDC7m4acQ@mail.gmail.com>
Subject: Re: StructuredStreaming status
To: Matei Zaharia <matei.zaharia@gmail.com>
Cc: dev <dev@spark.apache.org>
Content-Type: multipart/alternative; boundary=e89a8ff250b4181f4d053f496ab3
archived-at: Thu, 20 Oct 2016 10:35:42 -0000

--e89a8ff250b4181f4d053f496ab3
Content-Type: text/plain; charset=UTF-8

On Thu, Oct 20, 2016 at 7:40 AM Matei Zaharia <matei.zaharia@gmail.com>
wrote:

> Yeah, as Shivaram pointed out, there have been research projects that
> looked at it. Also, Structured Streaming was explicitly designed to not
> make microbatching part of the API or part of the output behavior (tying
> triggers to it).
>
But Streaming Query sources
<https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Source.scala#L41>
are
still designed with microbatches in mind, can this be removed and leave
offset tracking to the executors ?

> However, when people begin working on that is a function of demand
> relative to other features. I don't think we can commit to one plan before
> exploring more options, but basically there is Shivaram's project, which
> adds a few new concepts to the scheduler, and there's the option to reduce
> control plane latency in the current system, which hasn't been heavily
> optimized yet but should be doable (lots of systems can handle 10,000s of
> RPCs per second).
>
> Matei
>
> On Oct 19, 2016, at 9:20 PM, Cody Koeninger <cody@koeninger.org> wrote:
>
> I don't think it's just about what to target - if you could target 1ms
> batches, without harming 1 second or 1 minute batches.... why wouldn't you?
> I think it's about having a clear strategy and dedicating resources to it.
> If  scheduling batches at an order of magnitude or two lower latency is the
> strategy, and that's actually feasible, that's great. But I haven't seen
> that clear direction, and this is by no means a recent issue.
>
> On Oct 19, 2016 7:36 PM, "Matei Zaharia" <matei.zaharia@gmail.com> wrote:
>
> I'm also curious whether there are concerns other than latency with the
> way stuff executes in Structured Streaming (now that the time steps don't
> have to act as triggers), as well as what latency people want for various
> apps.
>
> The stateful operator designs for streaming systems aren't inherently
> "better" than micro-batching -- they lose a lot of stuff that is possible
> in Spark, such as load balancing work dynamically across nodes, speculative
> execution for stragglers, scaling clusters up and down elastically, etc.
> Moreover, Spark itself could execute the current model with much lower
> latency. The question is just what combinations of latency, throughput,
> fault recovery, etc to target.
>
> Matei
>
> On Oct 19, 2016, at 2:18 PM, Amit Sela <amitsela33@gmail.com> wrote:
>
>
>
> On Thu, Oct 20, 2016 at 12:07 AM Shivaram Venkataraman <
> shivaram@eecs.berkeley.edu> wrote:
>
> At the AMPLab we've been working on a research project that looks at
> just the scheduling latencies and on techniques to get lower
> scheduling latency. It moves away from the micro-batch model, but
> reuses the fault tolerance etc. in Spark. However we haven't yet
> figure out all the parts in integrating this with the rest of
> structured streaming. I'll try to post a design doc / SIP about this
> soon.
>
> On a related note - are there other problems users face with
> micro-batch other than latency ?
>
> I think that the fact that they serve as an output trigger is a problem,
> but Structured Streaming seems to resolve this now.
>
>
> Thanks
> Shivaram
>
> On Wed, Oct 19, 2016 at 1:29 PM, Michael Armbrust
> <michael@databricks.com> wrote:
> > I know people are seriously thinking about latency.  So far that has not
> > been the limiting factor in the users I've been working with.
> >
> > On Wed, Oct 19, 2016 at 1:11 PM, Cody Koeninger <cody@koeninger.org>
> wrote:
> >>
> >> Is anyone seriously thinking about alternatives to microbatches?
> >>
> >> On Wed, Oct 19, 2016 at 2:45 PM, Michael Armbrust
> >> <michael@databricks.com> wrote:
> >> > Anything that is actively being designed should be in JIRA, and it
> seems
> >> > like you found most of it.  In general, release windows can be found
> on
> >> > the
> >> > wiki.
> >> >
> >> > 2.1 has a lot of stability fixes as well as the kafka support you
> >> > mentioned.
> >> > It may also include some of the following.
> >> >
> >> > The items I'd like to start thinking about next are:
> >> >  - Evicting state from the store based on event time watermarks
> >> >  - Sessionization (grouping together related events by key /
> eventTime)
> >> >  - Improvements to the query planner (remove some of the restrictions
> on
> >> > what queries can be run).
> >> >
> >> > This is roughly in order based on what I've been hearing users hit the
> >> > most.
> >> > Would love more feedback on what is blocking real use cases.
> >> >
> >> > On Tue, Oct 18, 2016 at 1:51 AM, Ofir Manor <ofir.manor@equalum.io>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >> I hope it is the right forum.
> >> >> I am looking for some information of what to expect from
> >> >> StructuredStreaming in its next releases to help me choose when /
> where
> >> >> to
> >> >> start using it more seriously (or where to invest in workarounds and
> >> >> where
> >> >> to wait). I couldn't find a good place where such planning discussed
> >> >> for 2.1
> >> >> (like, for example ML and SPARK-15581).
> >> >> I'm aware of the 2.0 documented limits
> >> >>
> >> >> (
> http://spark.apache.org/docs/2.0.1/structured-streaming-programming-guide.html#unsupported-operations
> ),
> >> >> like no support for multiple aggregations levels, joins are strictly
> to
> >> >> a
> >> >> static dataset (no SCD or stream-stream) etc, limited sources / sinks
> >> >> (like
> >> >> no sink for interactive queries) etc etc
> >> >> I'm also aware of some changes that have landed in master, like the
> new
> >> >> Kafka 0.10 source (and its on-going improvements) in SPARK-15406, the
> >> >> metrics in SPARK-17731, and some improvements for the file source.
> >> >> If I remember correctly, the discussion on Spark release cadence
> >> >> concluded
> >> >> with a preference to a four-month cycles, with likely code freeze
> >> >> pretty
> >> >> soon (end of October). So I believe the scope for 2.1 should likely
> >> >> quite
> >> >> clear to some, and that 2.2 planning should likely be starting about
> >> >> now.
> >> >> Any visibility / sharing will be highly appreciated!
> >> >> thanks in advance,
> >> >>
> >> >> Ofir Manor
> >> >>
> >> >> Co-Founder & CTO | Equalum
> >> >>
> >> >> Mobile: +972-54-7801286 <054-780-1286> | Email:
> ofir.manor@equalum.io
> >> >
> >> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>
>
>

--e89a8ff250b4181f4d053f496ab3
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br><div class=3D"gmail_quote"><div dir=3D"ltr">On Thu=
, Oct 20, 2016 at 7:40 AM Matei Zaharia &lt;<a href=3D"mailto:matei.zaharia=
@gmail.com">matei.zaharia@gmail.com</a>&gt; wrote:<br></div><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pa=
dding-left:1ex"><div style=3D"word-wrap:break-word" class=3D"gmail_msg">Yea=
h, as Shivaram pointed out, there have been research projects that looked a=
t it. Also, Structured Streaming was explicitly designed to not make microb=
atching part of the API or part of the output behavior (tying triggers to i=
t). </div></blockquote><div>But Streaming Query <a href=3D"https://github.c=
om/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/ex=
ecution/streaming/Source.scala#L41">sources</a>=C2=A0are still designed wit=
h microbatches in mind, can this be removed and leave offset tracking to th=
e executors ?=C2=A0=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"m=
argin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style=3D=
"word-wrap:break-word" class=3D"gmail_msg">However, when people begin worki=
ng on that is a function of demand relative to other features. I don&#39;t =
think we can commit to one plan before exploring more options, but basicall=
y there is Shivaram&#39;s project, which adds a few new concepts to the sch=
eduler, and there&#39;s the option to reduce control plane latency in the c=
urrent system, which hasn&#39;t been heavily optimized yet but should be do=
able (lots of systems can handle 10,000s of RPCs per second).</div><div sty=
le=3D"word-wrap:break-word" class=3D"gmail_msg"><div class=3D"gmail_msg"><b=
r class=3D"gmail_msg"></div><div class=3D"gmail_msg">Matei</div></div><div =
style=3D"word-wrap:break-word" class=3D"gmail_msg"><div class=3D"gmail_msg"=
><br class=3D"gmail_msg"><div class=3D"gmail_msg"><blockquote type=3D"cite"=
 class=3D"gmail_msg"><div class=3D"gmail_msg">On Oct 19, 2016, at 9:20 PM, =
Cody Koeninger &lt;<a href=3D"mailto:cody@koeninger.org" class=3D"gmail_msg=
" target=3D"_blank">cody@koeninger.org</a>&gt; wrote:</div><br class=3D"m_1=
478589370592673861Apple-interchange-newline gmail_msg"><div class=3D"gmail_=
msg"><p dir=3D"ltr" class=3D"gmail_msg">I don&#39;t think it&#39;s just abo=
ut what to target - if you could target 1ms batches, without harming 1 seco=
nd or 1 minute batches.... why wouldn&#39;t you?<br class=3D"gmail_msg">
I think it&#39;s about having a clear strategy and dedicating resources to =
it. If=C2=A0 scheduling batches at an order of magnitude or two lower laten=
cy is the strategy, and that&#39;s actually feasible, that&#39;s great. But=
 I haven&#39;t seen that clear direction, and this is by no means a recent =
issue.</p>
<div class=3D"gmail_extra gmail_msg"><br class=3D"gmail_msg"><div class=3D"=
gmail_quote gmail_msg">On Oct 19, 2016 7:36 PM, &quot;Matei Zaharia&quot; &=
lt;<a href=3D"mailto:matei.zaharia@gmail.com" class=3D"gmail_msg" target=3D=
"_blank">matei.zaharia@gmail.com</a>&gt; wrote:<br type=3D"attribution" cla=
ss=3D"gmail_msg"><blockquote class=3D"gmail_quote gmail_msg" style=3D"margi=
n:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style=3D"wor=
d-wrap:break-word" class=3D"gmail_msg">I&#39;m also curious whether there a=
re concerns other than latency with the way stuff executes in Structured St=
reaming (now that the time steps don&#39;t have to act as triggers), as wel=
l as what latency people want for various apps.<div class=3D"gmail_msg"><br=
 class=3D"gmail_msg"></div><div class=3D"gmail_msg">The stateful operator d=
esigns for streaming systems aren&#39;t inherently &quot;better&quot; than =
micro-batching -- they lose a lot of stuff that is possible in Spark, such =
as load balancing work dynamically across nodes, speculative execution for =
stragglers, scaling clusters up and down elastically, etc. Moreover, Spark =
itself could execute the current model with much lower latency. The questio=
n is just what combinations of latency, throughput, fault recovery, etc to =
target.<br class=3D"gmail_msg"><div class=3D"gmail_msg"><br class=3D"gmail_=
msg"></div><div class=3D"gmail_msg">Matei</div><div class=3D"gmail_msg"><br=
 class=3D"gmail_msg"><div class=3D"gmail_msg"><blockquote type=3D"cite" cla=
ss=3D"gmail_msg"><div class=3D"gmail_msg">On Oct 19, 2016, at 2:18 PM, Amit=
 Sela &lt;<a href=3D"mailto:amitsela33@gmail.com" class=3D"gmail_msg" targe=
t=3D"_blank">amitsela33@gmail.com</a>&gt; wrote:</div><br class=3D"m_147858=
9370592673861m_8506420334745306420Apple-interchange-newline gmail_msg"><div=
 class=3D"gmail_msg"><div dir=3D"ltr" class=3D"gmail_msg"><br class=3D"gmai=
l_msg"><br class=3D"gmail_msg"><div class=3D"gmail_quote gmail_msg"><div di=
r=3D"ltr" class=3D"gmail_msg">On Thu, Oct 20, 2016 at 12:07 AM Shivaram Ven=
kataraman &lt;<a href=3D"mailto:shivaram@eecs.berkeley.edu" class=3D"gmail_=
msg" target=3D"_blank">shivaram@eecs.berkeley.edu</a>&gt; wrote:<br class=
=3D"gmail_msg"></div><blockquote class=3D"gmail_quote gmail_msg" style=3D"m=
argin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">At the AMPLab=
 we&#39;ve been working on a research project that looks at<br class=3D"m_1=
478589370592673861m_8506420334745306420gmail_msg gmail_msg">
just the scheduling latencies and on techniques to get lower<br class=3D"m_=
1478589370592673861m_8506420334745306420gmail_msg gmail_msg">
scheduling latency. It moves away from the micro-batch model, but<br class=
=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_msg">
reuses the fault tolerance etc. in Spark. However we haven&#39;t yet<br cla=
ss=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_msg">
figure out all the parts in integrating this with the rest of<br class=3D"m=
_1478589370592673861m_8506420334745306420gmail_msg gmail_msg">
structured streaming. I&#39;ll try to post a design doc / SIP about this<br=
 class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_msg">
soon.<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail=
_msg">
<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_msg"=
>
On a related note - are there other problems users face with<br class=3D"m_=
1478589370592673861m_8506420334745306420gmail_msg gmail_msg">
micro-batch other than latency ?<br class=3D"m_1478589370592673861m_8506420=
334745306420gmail_msg gmail_msg"></blockquote><div class=3D"gmail_msg">I th=
ink that the fact that they serve as an output trigger is a problem, but St=
ructured Streaming seems to resolve this now. =C2=A0</div><blockquote class=
=3D"gmail_quote gmail_msg" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc =
solid;padding-left:1ex">
<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_msg"=
>
Thanks<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmai=
l_msg">
Shivaram<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gm=
ail_msg">
<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_msg"=
>
On Wed, Oct 19, 2016 at 1:29 PM, Michael Armbrust<br class=3D"m_14785893705=
92673861m_8506420334745306420gmail_msg gmail_msg">
&lt;<a href=3D"mailto:michael@databricks.com" class=3D"m_147858937059267386=
1m_8506420334745306420gmail_msg gmail_msg" target=3D"_blank">michael@databr=
icks.com</a>&gt; wrote:<br class=3D"m_1478589370592673861m_8506420334745306=
420gmail_msg gmail_msg">
&gt; I know people are seriously thinking about latency.=C2=A0 So far that =
has not<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gma=
il_msg">
&gt; been the limiting factor in the users I&#39;ve been working with.<br c=
lass=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_msg">
&gt;<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_=
msg">
&gt; On Wed, Oct 19, 2016 at 1:11 PM, Cody Koeninger &lt;<a href=3D"mailto:=
cody@koeninger.org" class=3D"m_1478589370592673861m_8506420334745306420gmai=
l_msg gmail_msg" target=3D"_blank">cody@koeninger.org</a>&gt; wrote:<br cla=
ss=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_msg">
&gt;&gt;<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gm=
ail_msg">
&gt;&gt; Is anyone seriously thinking about alternatives to microbatches?<b=
r class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_msg">
&gt;&gt;<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gm=
ail_msg">
&gt;&gt; On Wed, Oct 19, 2016 at 2:45 PM, Michael Armbrust<br class=3D"m_14=
78589370592673861m_8506420334745306420gmail_msg gmail_msg">
&gt;&gt; &lt;<a href=3D"mailto:michael@databricks.com" class=3D"m_147858937=
0592673861m_8506420334745306420gmail_msg gmail_msg" target=3D"_blank">micha=
el@databricks.com</a>&gt; wrote:<br class=3D"m_1478589370592673861m_8506420=
334745306420gmail_msg gmail_msg">
&gt;&gt; &gt; Anything that is actively being designed should be in JIRA, a=
nd it seems<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg=
 gmail_msg">
&gt;&gt; &gt; like you found most of it.=C2=A0 In general, release windows =
can be found on<br class=3D"m_1478589370592673861m_8506420334745306420gmail=
_msg gmail_msg">
&gt;&gt; &gt; the<br class=3D"m_1478589370592673861m_8506420334745306420gma=
il_msg gmail_msg">
&gt;&gt; &gt; wiki.<br class=3D"m_1478589370592673861m_8506420334745306420g=
mail_msg gmail_msg">
&gt;&gt; &gt;<br class=3D"m_1478589370592673861m_8506420334745306420gmail_m=
sg gmail_msg">
&gt;&gt; &gt; 2.1 has a lot of stability fixes as well as the kafka support=
 you<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_=
msg">
&gt;&gt; &gt; mentioned.<br class=3D"m_1478589370592673861m_850642033474530=
6420gmail_msg gmail_msg">
&gt;&gt; &gt; It may also include some of the following.<br class=3D"m_1478=
589370592673861m_8506420334745306420gmail_msg gmail_msg">
&gt;&gt; &gt;<br class=3D"m_1478589370592673861m_8506420334745306420gmail_m=
sg gmail_msg">
&gt;&gt; &gt; The items I&#39;d like to start thinking about next are:<br c=
lass=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_msg">
&gt;&gt; &gt;=C2=A0 - Evicting state from the store based on event time wat=
ermarks<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gma=
il_msg">
&gt;&gt; &gt;=C2=A0 - Sessionization (grouping together related events by k=
ey / eventTime)<br class=3D"m_1478589370592673861m_8506420334745306420gmail=
_msg gmail_msg">
&gt;&gt; &gt;=C2=A0 - Improvements to the query planner (remove some of the=
 restrictions on<br class=3D"m_1478589370592673861m_8506420334745306420gmai=
l_msg gmail_msg">
&gt;&gt; &gt; what queries can be run).<br class=3D"m_1478589370592673861m_=
8506420334745306420gmail_msg gmail_msg">
&gt;&gt; &gt;<br class=3D"m_1478589370592673861m_8506420334745306420gmail_m=
sg gmail_msg">
&gt;&gt; &gt; This is roughly in order based on what I&#39;ve been hearing =
users hit the<br class=3D"m_1478589370592673861m_8506420334745306420gmail_m=
sg gmail_msg">
&gt;&gt; &gt; most.<br class=3D"m_1478589370592673861m_8506420334745306420g=
mail_msg gmail_msg">
&gt;&gt; &gt; Would love more feedback on what is blocking real use cases.<=
br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_msg">
&gt;&gt; &gt;<br class=3D"m_1478589370592673861m_8506420334745306420gmail_m=
sg gmail_msg">
&gt;&gt; &gt; On Tue, Oct 18, 2016 at 1:51 AM, Ofir Manor &lt;<a href=3D"ma=
ilto:ofir.manor@equalum.io" class=3D"m_1478589370592673861m_850642033474530=
6420gmail_msg gmail_msg" target=3D"_blank">ofir.manor@equalum.io</a>&gt;<br=
 class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_msg">
&gt;&gt; &gt; wrote:<br class=3D"m_1478589370592673861m_8506420334745306420=
gmail_msg gmail_msg">
&gt;&gt; &gt;&gt;<br class=3D"m_1478589370592673861m_8506420334745306420gma=
il_msg gmail_msg">
&gt;&gt; &gt;&gt; Hi,<br class=3D"m_1478589370592673861m_850642033474530642=
0gmail_msg gmail_msg">
&gt;&gt; &gt;&gt; I hope it is the right forum.<br class=3D"m_1478589370592=
673861m_8506420334745306420gmail_msg gmail_msg">
&gt;&gt; &gt;&gt; I am looking for some information of what to expect from<=
br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_msg">
&gt;&gt; &gt;&gt; StructuredStreaming in its next releases to help me choos=
e when / where<br class=3D"m_1478589370592673861m_8506420334745306420gmail_=
msg gmail_msg">
&gt;&gt; &gt;&gt; to<br class=3D"m_1478589370592673861m_8506420334745306420=
gmail_msg gmail_msg">
&gt;&gt; &gt;&gt; start using it more seriously (or where to invest in work=
arounds and<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg=
 gmail_msg">
&gt;&gt; &gt;&gt; where<br class=3D"m_1478589370592673861m_8506420334745306=
420gmail_msg gmail_msg">
&gt;&gt; &gt;&gt; to wait). I couldn&#39;t find a good place where such pla=
nning discussed<br class=3D"m_1478589370592673861m_8506420334745306420gmail=
_msg gmail_msg">
&gt;&gt; &gt;&gt; for 2.1<br class=3D"m_1478589370592673861m_85064203347453=
06420gmail_msg gmail_msg">
&gt;&gt; &gt;&gt; (like, for example ML and SPARK-15581).<br class=3D"m_147=
8589370592673861m_8506420334745306420gmail_msg gmail_msg">
&gt;&gt; &gt;&gt; I&#39;m aware of the 2.0 documented limits<br class=3D"m_=
1478589370592673861m_8506420334745306420gmail_msg gmail_msg">
&gt;&gt; &gt;&gt;<br class=3D"m_1478589370592673861m_8506420334745306420gma=
il_msg gmail_msg">
&gt;&gt; &gt;&gt; (<a href=3D"http://spark.apache.org/docs/2.0.1/structured=
-streaming-programming-guide.html#unsupported-operations" rel=3D"noreferrer=
" class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_msg" t=
arget=3D"_blank">http://spark.apache.org/docs/2.0.1/structured-streaming-pr=
ogramming-guide.html#unsupported-operations</a>),<br class=3D"m_14785893705=
92673861m_8506420334745306420gmail_msg gmail_msg">
&gt;&gt; &gt;&gt; like no support for multiple aggregations levels, joins a=
re strictly to<br class=3D"m_1478589370592673861m_8506420334745306420gmail_=
msg gmail_msg">
&gt;&gt; &gt;&gt; a<br class=3D"m_1478589370592673861m_8506420334745306420g=
mail_msg gmail_msg">
&gt;&gt; &gt;&gt; static dataset (no SCD or stream-stream) etc, limited sou=
rces / sinks<br class=3D"m_1478589370592673861m_8506420334745306420gmail_ms=
g gmail_msg">
&gt;&gt; &gt;&gt; (like<br class=3D"m_1478589370592673861m_8506420334745306=
420gmail_msg gmail_msg">
&gt;&gt; &gt;&gt; no sink for interactive queries) etc etc<br class=3D"m_14=
78589370592673861m_8506420334745306420gmail_msg gmail_msg">
&gt;&gt; &gt;&gt; I&#39;m also aware of some changes that have landed in ma=
ster, like the new<br class=3D"m_1478589370592673861m_8506420334745306420gm=
ail_msg gmail_msg">
&gt;&gt; &gt;&gt; Kafka 0.10 source (and its on-going improvements) in SPAR=
K-15406, the<br class=3D"m_1478589370592673861m_8506420334745306420gmail_ms=
g gmail_msg">
&gt;&gt; &gt;&gt; metrics in SPARK-17731, and some improvements for the fil=
e source.<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg g=
mail_msg">
&gt;&gt; &gt;&gt; If I remember correctly, the discussion on Spark release =
cadence<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gma=
il_msg">
&gt;&gt; &gt;&gt; concluded<br class=3D"m_1478589370592673861m_850642033474=
5306420gmail_msg gmail_msg">
&gt;&gt; &gt;&gt; with a preference to a four-month cycles, with likely cod=
e freeze<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gm=
ail_msg">
&gt;&gt; &gt;&gt; pretty<br class=3D"m_1478589370592673861m_850642033474530=
6420gmail_msg gmail_msg">
&gt;&gt; &gt;&gt; soon (end of October). So I believe the scope for 2.1 sho=
uld likely<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg =
gmail_msg">
&gt;&gt; &gt;&gt; quite<br class=3D"m_1478589370592673861m_8506420334745306=
420gmail_msg gmail_msg">
&gt;&gt; &gt;&gt; clear to some, and that 2.2 planning should likely be sta=
rting about<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg=
 gmail_msg">
&gt;&gt; &gt;&gt; now.<br class=3D"m_1478589370592673861m_85064203347453064=
20gmail_msg gmail_msg">
&gt;&gt; &gt;&gt; Any visibility / sharing will be highly appreciated!<br c=
lass=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_msg">
&gt;&gt; &gt;&gt; thanks in advance,<br class=3D"m_1478589370592673861m_850=
6420334745306420gmail_msg gmail_msg">
&gt;&gt; &gt;&gt;<br class=3D"m_1478589370592673861m_8506420334745306420gma=
il_msg gmail_msg">
&gt;&gt; &gt;&gt; Ofir Manor<br class=3D"m_1478589370592673861m_85064203347=
45306420gmail_msg gmail_msg">
&gt;&gt; &gt;&gt;<br class=3D"m_1478589370592673861m_8506420334745306420gma=
il_msg gmail_msg">
&gt;&gt; &gt;&gt; Co-Founder &amp; CTO | Equalum<br class=3D"m_147858937059=
2673861m_8506420334745306420gmail_msg gmail_msg">
&gt;&gt; &gt;&gt;<br class=3D"m_1478589370592673861m_8506420334745306420gma=
il_msg gmail_msg">
&gt;&gt; &gt;&gt; Mobile: <a href=3D"tel:054-780-1286" value=3D"+9725478012=
86" class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_msg"=
 target=3D"_blank">+972-54-7801286</a> | Email: <a href=3D"mailto:ofir.mano=
r@equalum.io" class=3D"m_1478589370592673861m_8506420334745306420gmail_msg =
gmail_msg" target=3D"_blank">ofir.manor@equalum.io</a><br class=3D"m_147858=
9370592673861m_8506420334745306420gmail_msg gmail_msg">
&gt;&gt; &gt;<br class=3D"m_1478589370592673861m_8506420334745306420gmail_m=
sg gmail_msg">
&gt;&gt; &gt;<br class=3D"m_1478589370592673861m_8506420334745306420gmail_m=
sg gmail_msg">
&gt;<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_=
msg">
&gt;<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_=
msg">
<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_msg"=
>
---------------------------------------------------------------------<br cl=
ass=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_msg">
To unsubscribe e-mail: <a href=3D"mailto:dev-unsubscribe@spark.apache.org" =
class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_msg" tar=
get=3D"_blank">dev-unsubscribe@spark.apache.org</a><br class=3D"m_147858937=
0592673861m_8506420334745306420gmail_msg gmail_msg">
<br class=3D"m_1478589370592673861m_8506420334745306420gmail_msg gmail_msg"=
>
</blockquote></div></div>
</div></blockquote></div><br class=3D"gmail_msg"></div></div></div></blockq=
uote></div></div>
</div></blockquote></div><br class=3D"gmail_msg"></div></div></blockquote><=
/div></div>

--e89a8ff250b4181f4d053f496ab3--