Mailing-List: contact user-help@beam.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@beam.incubator.apache.org
MIME-Version: 1.0
In-Reply-To: <1719166910.14329956.1470692246656.JavaMail.yahoo@mail.yahoo.com>
References: <1114573436.13709006.1470638686505.JavaMail.yahoo.ref@mail.yahoo.com>
 <1114573436.13709006.1470638686505.JavaMail.yahoo@mail.yahoo.com>
 <CAJVN+PybPdiV28muhmq_NN8bTmBhxB+a4dMF5-aV98E3W4VaMQ@mail.gmail.com>
 <889660141.14153262.1470685573929.JavaMail.yahoo@mail.yahoo.com>
 <CAJVN+PyiYk-ka1_E159cO6=eGHD_dZG5ebniu95B8gSb4937iQ@mail.gmail.com> <1719166910.14329956.1470692246656.JavaMail.yahoo@mail.yahoo.com>
From: Thomas Groh <tgroh@google.com>
Date: Mon, 8 Aug 2016 15:50:57 -0700
Message-ID: <CAJVN+PwFDrmwto=3Hd7WM2Yww0Y4B6jRhVi=V_Ua7tTbA+sx9g@mail.gmail.com>
Subject: Re: Is Beam pipeline runtime behavior inconsistent?
To: user@beam.incubator.apache.org, amir bahmanyari <amirtousa@yahoo.com>
Content-Type: multipart/alternative; boundary=94eb2c0565c8f53e980539973f66
archived-at: Mon, 08 Aug 2016 22:51:13 -0000

--94eb2c0565c8f53e980539973f66
Content-Type: text/plain; charset=UTF-8

You would performance no better than single-threaded behavior if you group
everything into a single key, hence why this approach is strongly not
recommended. You can still get continuous output, depending on the
triggering, but you lose all of scaling benefits of running a pipeline as
opposed to a simple Java program, plus may incur some additional overhead.

To enforce this sort of threading you would do something among the lines of:

kafkarecords.apply(WithKeys.<Integer, String>of(1))
    .apply(GroupByKey.<Integer, String>create())
    .apply(Values.<Iterable<String>>create())
    .apply(new DoFn<Iterable<String>, String>() {...});

Where the DoFn unrolls its input and on each element applies the processing.


On Mon, Aug 8, 2016 at 2:37 PM, amir bahmanyari <amirtousa@yahoo.com> wrote:

> Thanks so much Thomas.
> Fantastic answer & great learning about whats really going on underneath
> the hood.
> Have a question on your suggestion: "To do so, you would key the inputs
> to a single static key and apply a GroupByKey, running the processing
> method on the output Iterable produced by the GroupByKey"...
> Wouldn't doing such defeats the "real-time Streaming" objectives?
> To me the above leads to a simulation of a simple single threaded java
> process but its executing in a massively parallel infrastructure in
>  a"fancy" way :-)
> Is there an example that demonstrates how to actually implement your
> suggestion above without any hidden loopholes pls?
> I can at least try it and see how far it gets for R&D purposes & share the
> results with the community.
> Cheers+have a wonderful day.
>
> ------------------------------
> *From:* Thomas Groh <tgroh@google.com>
> *To:* user@beam.incubator.apache.org; amir bahmanyari <amirtousa@yahoo.com>
>
> *Sent:* Monday, August 8, 2016 1:44 PM
> *Subject:* Re: Is Beam pipeline runtime behavior inconsistent?
>
> There's no way to guarantee that exactly one record is processed at a
> time. This is part of the design of ParDo to work efficiently across
> multiple processes and machines[1], where multiple instances of a DoFn must
> exist in order for progress to be made in a timely fashion. This includes
> processing the same element across multiple machines at the same time, with
> only one of the results being available in the output PCollection, as well
> as retries of failed elements.
>
> A runner is required to interact with a DoFn instance in a single-threaded
> manner - however, it is permitted to have multiple different DoFn instances
> active within a single process and across processes at any given time (for
> the same reasons as above). There's no support in the Beam model to
> restrict this type of execution. We do not encourage sharing objects
> between DoFn instances, and any shared state must be accessed in a
> thread-safe manner, and modifications to shared state should be idempotent,
> as otherwise retries and speculative execution may cause that state to be
> inconsistent. A DoFn will be reused for multiple elements across a single
> bundle, and may be reused across multiple bundles - if you require the DoFn
> to be "fresh" per element, it should perform any required setup at the
> start of the ProcessElement method.
>
> The best that can be done if it is absolutely required to restrict
> processing to a single element at a time would be to group all of the
> elements to a single key. Note that this will not solve the problem in all
> cases, as a runner is permitted to execute the group of elements multiple
> times so long as it only takes one completed bundle as the result, and
> additionally this removes the ability of the runner to balance work and
> introduces a performance bottleneck. To do so, you would key the inputs to
> a single static key and apply a GroupByKey, running the processing method
> on the output Iterable produced by the GroupByKey (directly; expanding the
> input iterable in a separate PCollection allows a runner to rebalance the
> elements, which will reintroduce parallelism)`.
>
> [1] https://github.com/apache/ incubator-beam/blob/master/
> sdks/java/core/src/main/java/ org/apache/beam/sdk/
> transforms/ParDo.java#L360
> <https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ParDo.java#L360>
>
> On Mon, Aug 8, 2016 at 12:46 PM, amir bahmanyari <amirtousa@yahoo.com>
> wrote:
>
> Hi Thomas,
> Thanks so much for your response. Here are answers to your questions.
> You have a specific collection of records stored in Kafka. You run your
> pipeline, and observe duplicate elements. Is that accurate?
>
> ==>> I send records to Kafka from my laptop. I use KafkaIO() to receive
> the records. I have confirmed that I dont get duplicates from Kafka.
> However,
> for some reason, certain parts of my code execute beyond the actual number
> of expected number of records, and subsequently produce extra resulting
> data.
> I tried playing with the Triggering. Stretching the window interval,
> DiscardingFiredPanes etc. all kinds of modes.
> Same.  How can I guarantee that one record at a time executes in one
> unique instance of the inner class object?
> I have all the shared objects synchronized and am using Java concurrent
> hashmaps. How can I guarantee synchronized operations amongst "parallel
> pipelines"? Analogous to multiple threads accessing a shared object and
> trying to modify it...
>
> Here is my current KafkaIO() call:
> PCollection<String> kafkarecords = p.apply(KafkaIO.read().
> withBootstrapServers(" kafkahost:9092").withTopics( topics).
> withValueCoder( StringUtf8Coder.of()). withoutMetadata()).apply(
> Values.<String>create()). apply(Window.<String>into(
> FixedWindows.of(Duration. standardMinutes(1)))
>          .triggering(AfterWatermark. pastEndOfWindow()).
> withAllowedLateness(Duration. ZERO)
>    .discardingFiredPanes());
>
>     kafkarecords.apply(ParDo. named("ProcessLRKafkaData"). of(new
> DoFn<String, String>() {.//I expect one record at a time to one object here
> ------------------------------ ------------------------------
> ------------------------------ ------------------------------
> -----------------------
>
> Have you confirmed that you're getting duplicate records via other library
> transforms (such as applying Count.globally() to k afkarecords)?
> ==>>No duplicates from Kafka.
> ------------------------------ ------------------------------
> ------------------------------ ------------------------------
> -----------------------
> Additionally, I'm not sure what you mean by "executes till a record lands
> on method"
> ==>>Sorry for my confusing statement. Like I mentioned above, I expect
> each record coming from Kafka gets assigned to one instance of the inner
> class and therefore one instance of the pipeline executed it in parallel
> with others executing their own unique records.
>
> ------------------------------ ------------------------------
> ------------------------------ ------------------------------
> -----------------------
>
> Additionally additionally, is this reproducible if you execute with the
> DirectRunner?
> ==>>I have not tried DirectRunner. Should I?
>
> Thanks so much Thomas.
>
>
> ------------------------------
> *From:* Thomas Groh <tgroh@google.com>
> *To:* user@beam.incubator.apache.org ; amir bahmanyari <
> amirtousa@yahoo.com>
> *Sent:* Monday, August 8, 2016 11:43 AM
> *Subject:* Re: Is Beam pipeline runtime behavior inconsistent?
>
> Just to make sure I understand the problem:
>
> You have a specific collection of records stored in Kafka. You run your
> pipeline, and observe duplicate elements. Is that accurate?
>
> Have you confirmed that you're getting duplicate records via other library
> transforms (such as applying Count.globally() to kafkarecords)?
>
> Additionally, I'm not sure what you mean by "executes till a record lands
> on method"
>
> Additionally additionally, is this reproducible if you execute with the
> DirectRunner?
>
>
> On Sun, Aug 7, 2016 at 11:44 PM, amir bahmanyari <amirtousa@yahoo.com>
> wrote:
>
> Hi Colleagues,
> I refrained from posting this email before completing thorough testing.
> I think I did.
> My core code works perfect & produces the expect result every single time
> without wrapping it with Beam KafkaIO to receive the data.
> Without KafkaIO, it receives the records from a flat data file. I repeated
> it and it always produced the right result.
> With including a Beam KarkaIO and embedding exact same code in a anonymous
> class running Beam pipelines, I get a different result every time I rerun
> it.
> Below is the snippet from where KafkaIO executes till a record lands on
> method.
> Kafka sends precise number of records. No duplicates. all good.
> While executing in Beam, when the records are finished & I expect a
> correct result, it always produces something different.
> Different in different runs.
> I appreciate shedding light on this issue.  And thanks for your valuable
> time as always.
> Amir-
>
> public static synchronized void main(String[] args) throws Exception {
>
> // Create Beam Options for the Flink Runner.
> FlinkPipelineOptions options = PipelineOptionsFactory.as(
> FlinkPipelineOptions.class);
> // Set the Streaming engine as FlinkRunner
> options.setRunner( FlinkPipelineRunner.class);
> // This is a Streaming process (as opposed to Batch=false)
> options.setStreaming(true);
> //Create the DAG pipeline for parallel processing of independent LR records
> Pipeline p = Pipeline.create(options);
> //Kafka broker topic is identified as "lroad"
> List<String> topics = Arrays.asList("lroad");
>
> PCollection<String> kafkarecords = p.apply(KafkaIO.read().
> withBootstrapServers(" kafkahost:9092").withTopics( topics).
> withValueCoder( StringUtf8Coder.of()). withoutMetadata()).apply(
> Values.<String>create()). apply(Window.<String>into(
> FixedWindows.of(Duration. standardMinutes(1)))
>          .triggering(AfterWatermark. pastEndOfWindow()).
> withAllowedLateness(Duration. ZERO)
>    .accumulatingFiredPanes());
>
>     kafkarecords.apply(ParDo. named("ProcessLRKafkaData"). of(new
> DoFn<String, String>() {
>
>                         public void processElement(ProcessContext ctx)
> throws Exception {
>
>                                         *My core logic code here.*
> }));
> .
> .
> p.run(); // Start Beam Pipeline(s) in FlinkC Cluster
> } // of main
> }// of class
>
>
>
>
>
>
>
>

--94eb2c0565c8f53e980539973f66
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">You would performance no better than single-threaded behav=
ior if you group everything into a single key, hence why this approach is s=
trongly not recommended. You can still get continuous output, depending on =
the triggering, but you lose all of scaling benefits of running a pipeline =
as opposed to a simple Java program, plus may incur some additional overhea=
d.<div><br></div><div><div><div><div><div>To enforce this sort of threading=
 you would do something among the lines of:</div><div><br></div><div>kafkar=
ecords.apply(WithKeys.&lt;Integer, String&gt;of(1))</div><div>=C2=A0 =C2=A0=
 .apply(GroupByKey.&lt;Integer, String&gt;create())</div><div>=C2=A0 =C2=A0=
 .apply(Values.&lt;Iterable&lt;String&gt;&gt;create())</div><div>=C2=A0 =C2=
=A0 .apply(new DoFn&lt;Iterable&lt;String&gt;, String&gt;() {...});</div><d=
iv><br></div><div>Where the DoFn unrolls its input and on each element appl=
ies the processing.<div><div><br></div></div></div></div></div></div></div>=
</div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Mon, Aug=
 8, 2016 at 2:37 PM, amir bahmanyari <span dir=3D"ltr">&lt;<a href=3D"mailt=
o:amirtousa@yahoo.com" target=3D"_blank">amirtousa@yahoo.com</a>&gt;</span>=
 wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bor=
der-left:1px #ccc solid;padding-left:1ex"><div><div style=3D"color:#000;bac=
kground-color:#fff;font-family:HelveticaNeue,Helvetica Neue,Helvetica,Arial=
,Lucida Grande,sans-serif;font-size:12px"><div dir=3D"ltr" id=3D"m_-1052079=
06128589263yui_3_16_0_1_1470678013649_31330"><span id=3D"m_-105207906128589=
263yui_3_16_0_1_1470678013649_31329">Thanks so much Thomas.=C2=A0</span></d=
iv><div dir=3D"ltr" id=3D"m_-105207906128589263yui_3_16_0_1_1470678013649_3=
1330"><span id=3D"m_-105207906128589263yui_3_16_0_1_1470678013649_31520">Fa=
ntastic answer &amp; great learning about whats really going on underneath =
the hood.</span></div><div dir=3D"ltr" id=3D"m_-105207906128589263yui_3_16_=
0_1_1470678013649_31328"><span id=3D"m_-105207906128589263yui_3_16_0_1_1470=
678013649_31252">Have a question on your suggestion: &quot;</span><span sty=
le=3D"font-family:&quot;Helvetica Neue&quot;,&quot;Segoe UI&quot;,Helvetica=
,Arial,&quot;Lucida Grande&quot;,sans-serif;font-size:13px" id=3D"m_-105207=
906128589263yui_3_16_0_1_1470678013649_31253">To do so, you would key the i=
nputs to a single static key and apply a GroupByKey, running the processing=
 method on the output Iterable produced by the GroupByKey</span>&quot;...</=
div><div dir=3D"ltr" id=3D"m_-105207906128589263yui_3_16_0_1_1470678013649_=
31337">Wouldn&#39;t doing such defeats the &quot;real-time Streaming&quot; =
objectives?</div><div dir=3D"ltr" id=3D"m_-105207906128589263yui_3_16_0_1_1=
470678013649_31338">To me the above leads to a simulation of a simple singl=
e threaded java process but its executing in a massively parallel infrastru=
cture in =C2=A0a&quot;fancy&quot; way :-)</div><div dir=3D"ltr" id=3D"m_-10=
5207906128589263yui_3_16_0_1_1470678013649_31339">Is there an example that =
demonstrates how to actually implement your suggestion above without any hi=
dden loopholes pls?=C2=A0</div><div dir=3D"ltr" id=3D"m_-105207906128589263=
yui_3_16_0_1_1470678013649_31339">I can at least try it and see how far it =
gets for R&amp;D purposes &amp; share the results with the community.</div>=
<div dir=3D"ltr" id=3D"m_-105207906128589263yui_3_16_0_1_1470678013649_3133=
9">Cheers+have a wonderful day.</div><div class=3D"m_-105207906128589263qtd=
SeparateBR" id=3D"m_-105207906128589263yui_3_16_0_1_1470678013649_31194"><b=
r></div><div class=3D"m_-105207906128589263yahoo_quoted" id=3D"m_-105207906=
128589263yui_3_16_0_1_1470678013649_31346" style=3D"display:block">  <div s=
tyle=3D"font-family:HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Gra=
nde,sans-serif;font-size:12px" id=3D"m_-105207906128589263yui_3_16_0_1_1470=
678013649_31345"> <div style=3D"font-family:HelveticaNeue,Helvetica Neue,He=
lvetica,Arial,Lucida Grande,sans-serif;font-size:16px" id=3D"m_-10520790612=
8589263yui_3_16_0_1_1470678013649_31344"> <div dir=3D"ltr" id=3D"m_-1052079=
06128589263yui_3_16_0_1_1470678013649_31343"> <font size=3D"2" face=3D"Aria=
l" id=3D"m_-105207906128589263yui_3_16_0_1_1470678013649_31342"><span class=
=3D""> <hr size=3D"1" id=3D"m_-105207906128589263yui_3_16_0_1_1470678013649=
_31341"> <b><span style=3D"font-weight:bold">From:</span></b> Thomas Groh &=
lt;<a href=3D"mailto:tgroh@google.com" target=3D"_blank">tgroh@google.com</=
a>&gt;<br> <b><span style=3D"font-weight:bold">To:</span></b> <a href=3D"ma=
ilto:user@beam.incubator.apache.org" target=3D"_blank">user@beam.incubator.=
apache.org</a><wbr>; amir bahmanyari &lt;<a href=3D"mailto:amirtousa@yahoo.=
com" target=3D"_blank">amirtousa@yahoo.com</a>&gt; <br> </span><b><span sty=
le=3D"font-weight:bold">Sent:</span></b> Monday, August 8, 2016 1:44 PM<spa=
n class=3D""><br> <b><span style=3D"font-weight:bold">Subject:</span></b> R=
e: Is Beam pipeline runtime behavior inconsistent?<br> </span></font> </div=
> <div class=3D"m_-105207906128589263y_msg_container" id=3D"m_-105207906128=
589263yui_3_16_0_1_1470678013649_31521"><br><div id=3D"m_-10520790612858926=
3yiv8746533902"><div id=3D"m_-105207906128589263yui_3_16_0_1_1470678013649_=
31523"><span class=3D""><div dir=3D"ltr" id=3D"m_-105207906128589263yui_3_1=
6_0_1_1470678013649_31522">There&#39;s no way to guarantee that exactly one=
 record is processed at a time. This is part of the design of <font face=3D=
"monospace, monospace">ParDo</font> to work efficiently across multiple pro=
cesses and machines[1], where multiple instances of a DoFn must exist in or=
der for progress to be made in a timely fashion. This includes processing t=
he same element across multiple machines at the same time, with only one of=
 the results being available in the output PCollection, as well as retries =
of failed elements.<div id=3D"m_-105207906128589263yui_3_16_0_1_14706780136=
49_31524"><br clear=3D"none"></div><div id=3D"m_-105207906128589263yui_3_16=
_0_1_1470678013649_31525">A runner is required to interact with a DoFn inst=
ance in a single-threaded manner - however, it is permitted to have multipl=
e different DoFn instances active within a single process and across proces=
ses at any given time (for the same reasons as above). There&#39;s no suppo=
rt in the Beam model to restrict this type of execution. We do not encourag=
e sharing objects between DoFn instances, and any shared state must be acce=
ssed in a thread-safe manner, and modifications to shared state should be i=
dempotent, as otherwise retries and speculative execution may cause that st=
ate to be inconsistent.=C2=A0A DoFn will be reused for multiple elements ac=
ross a single bundle, and may be reused across multiple bundles - if you re=
quire the DoFn to be &quot;fresh&quot; per element, it should perform any r=
equired setup at the start of the <font face=3D"monospace, monospace">Proce=
ssElement</font> method.</div><div id=3D"m_-105207906128589263yui_3_16_0_1_=
1470678013649_31526"><br clear=3D"none"></div><div id=3D"m_-105207906128589=
263yui_3_16_0_1_1470678013649_31527">The best that can be done if it is abs=
olutely required to restrict processing to a single element at a time would=
 be to group all of the elements to a single key. Note that this will not s=
olve the problem in all cases, as a runner is permitted to execute the grou=
p of elements multiple times so long as it only takes one completed bundle =
as the result, and additionally this removes the ability of the runner to b=
alance work and introduces a performance bottleneck. To do so, you would ke=
y the inputs to a single static key and apply a GroupByKey, running the pro=
cessing method on the output Iterable produced by the GroupByKey (directly;=
 expanding the input iterable in a separate PCollection allows a runner to =
rebalance the elements, which will reintroduce parallelism)`.</div><div id=
=3D"m_-105207906128589263yui_3_16_0_1_1470678013649_31532"><div id=3D"m_-10=
5207906128589263yui_3_16_0_1_1470678013649_31531"><div id=3D"m_-10520790612=
8589263yui_3_16_0_1_1470678013649_31530"><div id=3D"m_-105207906128589263yu=
i_3_16_0_1_1470678013649_31529"><div id=3D"m_-105207906128589263yui_3_16_0_=
1_1470678013649_31528"><br clear=3D"none"></div><div id=3D"m_-1052079061285=
89263yui_3_16_0_1_1470678013649_31534">[1] <a rel=3D"nofollow" shape=3D"rec=
t" href=3D"https://github.com/apache/incubator-beam/blob/master/sdks/java/c=
ore/src/main/java/org/apache/beam/sdk/transforms/ParDo.java#L360" id=3D"m_-=
105207906128589263yui_3_16_0_1_1470678013649_31533" target=3D"_blank">https=
://github.com/apache/ incubator-beam/blob/master/ sdks/java/core/src/main/j=
ava/ org/apache/beam/sdk/ transforms/ParDo.java#L360</a></div></div></div><=
/div></div></div></span><div class=3D"m_-105207906128589263yiv8746533902yqt=
5529081990" id=3D"m_-105207906128589263yiv8746533902yqt94436"><div class=3D=
"m_-105207906128589263yiv8746533902gmail_extra" id=3D"m_-105207906128589263=
yui_3_16_0_1_1470678013649_31535"><br clear=3D"none"><div class=3D"m_-10520=
7906128589263yiv8746533902gmail_quote" id=3D"m_-105207906128589263yui_3_16_=
0_1_1470678013649_31536"><span class=3D"">On Mon, Aug 8, 2016 at 12:46 PM, =
amir bahmanyari <span dir=3D"ltr">&lt;<a rel=3D"nofollow" shape=3D"rect" hr=
ef=3D"mailto:amirtousa@yahoo.com" target=3D"_blank">amirtousa@yahoo.com</a>=
&gt;</span> wrote:<br clear=3D"none"></span><blockquote class=3D"m_-1052079=
06128589263yiv8746533902gmail_quote" style=3D"margin:0 0 0 .8ex;border-left=
:1px #ccc solid;padding-left:1ex" id=3D"m_-105207906128589263yui_3_16_0_1_1=
470678013649_31539"><div id=3D"m_-105207906128589263yui_3_16_0_1_1470678013=
649_31538"><div style=3D"color:#000;background-color:#fff;font-family:Helve=
ticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif;font-size:=
12px" id=3D"m_-105207906128589263yui_3_16_0_1_1470678013649_31537"><span cl=
ass=3D""><div id=3D"m_-105207906128589263yiv8746533902m_-713538770105079874=
1yui_3_16_0_1_1470678013649_19688"><span>Hi Thomas,</span></div><div id=3D"=
m_-105207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_147067=
8013649_19510"><span id=3D"m_-105207906128589263yui_3_16_0_1_1470678013649_=
31540">Thanks so much for your response. Here are answers to your questions=
.</span></div><span class=3D"m_-105207906128589263yiv8746533902"></span><di=
v id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_=
1_1470678013649_19350" style=3D"font-size:16px">You have a specific collect=
ion of records stored in Kafka. You run your pipeline, and observe duplicat=
e elements. Is that accurate?</div><div id=3D"m_-105207906128589263yiv87465=
33902m_-7135387701050798741yui_3_16_0_1_1470678013649_19350" style=3D"font-=
size:16px"><br clear=3D"none"></div><div dir=3D"ltr" id=3D"m_-1052079061285=
89263yiv8746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19351" s=
tyle=3D"font-size:16px">=3D=3D&gt;&gt; I send records to Kafka from my lapt=
op. I use KafkaIO() to receive the records. I have confirmed that I dont ge=
t duplicates from Kafka. However,</div><div dir=3D"ltr" id=3D"m_-1052079061=
28589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19351=
" style=3D"font-size:16px">for some reason, certain parts of my code execut=
e beyond the actual number of expected number of records, and subsequently =
produce extra resulting data.=C2=A0</div><div dir=3D"ltr" id=3D"m_-10520790=
6128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_193=
51" style=3D"font-size:16px">I tried playing with the Triggering. Stretchin=
g the window interval, DiscardingFiredPanes etc. all kinds of modes.</div><=
div dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798=
741yui_3_16_0_1_1470678013649_19351" style=3D"font-size:16px">Same.=C2=A0 H=
ow can I guarantee that one record at a time executes in one unique instanc=
e of the inner class object?</div><div dir=3D"ltr" id=3D"m_-105207906128589=
263yiv8746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19351" sty=
le=3D"font-size:16px">I have all the shared objects synchronized and am usi=
ng Java concurrent hashmaps. How can I guarantee synchronized operations am=
ongst &quot;parallel pipelines&quot;? Analogous to multiple threads accessi=
ng a shared object and trying to modify it...</div><div dir=3D"ltr" id=3D"m=
_-105207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_1470678=
013649_19351" style=3D"font-size:16px"><br clear=3D"none"></div><div dir=3D=
"ltr" id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yui_3_1=
6_0_1_1470678013649_19351" style=3D"font-size:16px">Here is my current Kafk=
aIO() call:</div></span><div dir=3D"ltr" id=3D"m_-105207906128589263yiv8746=
533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19351"><span class=
=3D""><span class=3D"m_-105207906128589263yiv8746533902"></span><div dir=3D=
"ltr" id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yui_3_1=
6_0_1_1470678013649_19729"><span id=3D"m_-105207906128589263yiv8746533902m_=
-7135387701050798741yui_3_16_0_1_1470678013649_19730" style=3D"font-size:16=
px"><span id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yui=
_3_16_0_1_1470678013649_19731" style=3D"white-space:pre-wrap">			</span>PCo=
llection&lt;String&gt; kafkarecords =3D p.apply(KafkaIO.read(). withBootstr=
apServers(&quot; kafkahost:9092&quot;).withTopics( topics).</span></div><di=
v dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-713538770105079874=
1yui_3_16_0_1_1470678013649_19732"><span id=3D"m_-105207906128589263yiv8746=
533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19733" style=3D"font=
-size:16px"><span id=3D"m_-105207906128589263yiv8746533902m_-71353877010507=
98741yui_3_16_0_1_1470678013649_19734" style=3D"white-space:pre-wrap">					=
</span>withValueCoder( StringUtf8Coder.of()). withoutMetadata()).apply( Val=
ues.&lt;String&gt;create()). apply(Window.&lt;String&gt;into( FixedWindows.=
of(Duration. standardMinutes(1)))</span></div><div dir=3D"ltr" id=3D"m_-105=
207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_147067801364=
9_19735"><span id=3D"m_-105207906128589263yiv8746533902m_-71353877010507987=
41yui_3_16_0_1_1470678013649_19736" style=3D"font-size:16px"><span id=3D"m_=
-105207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_14706780=
13649_19737" style=3D"white-space:pre-wrap">			</span> =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0.triggering(AfterWatermark. pastEndOfWindow()). withAllowedLa=
teness(Duration. ZERO)</span></div></span><div dir=3D"ltr" id=3D"m_-1052079=
06128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19=
738"><span id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yu=
i_3_16_0_1_1470678013649_19739" style=3D"font-size:16px"><span id=3D"m_-105=
207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_147067801364=
9_19740" style=3D"white-space:pre-wrap">					</span> =C2=A0 =C2=A0.discardi=
ngFiredPanes());</span></div><div dir=3D"ltr" id=3D"m_-105207906128589263yi=
v8746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19741"><span id=
=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_14=
70678013649_19742" style=3D"font-size:16px"><span id=3D"m_-1052079061285892=
63yiv8746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19743" styl=
e=3D"white-space:pre-wrap">	</span> =C2=A0 =C2=A0 =C2=A0 =C2=A0</span></div=
><div dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-71353877010507=
98741yui_3_16_0_1_1470678013649_19744"><span id=3D"m_-105207906128589263yiv=
8746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19745" style=3D"=
font-size:16px"><span id=3D"m_-105207906128589263yiv8746533902m_-7135387701=
050798741yui_3_16_0_1_1470678013649_19746" style=3D"white-space:pre-wrap">	=
</span> =C2=A0 =C2=A0<span id=3D"m_-105207906128589263yiv8746533902m_-71353=
87701050798741yui_3_16_0_1_1470678013649_19747" style=3D"white-space:pre-wr=
ap">	</span>kafkarecords.apply(ParDo. named(&quot;ProcessLRKafkaData&quot;)=
. of(new DoFn&lt;String, String&gt;() {.//I expect one record at a time to =
one object here</span></div></div><div><div class=3D"h5"><div dir=3D"ltr" i=
d=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_1=
470678013649_19351" style=3D"font-size:16px">------------------------------=
 ------------------------------ ------------------------------ ------------=
------------------ -----------------------</div><span class=3D"m_-105207906=
128589263yiv8746533902"></span><div dir=3D"ltr" id=3D"m_-105207906128589263=
yiv8746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19351" style=
=3D"font-size:16px"><br clear=3D"none"></div><div id=3D"m_-1052079061285892=
63yiv8746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19353" styl=
e=3D"font-size:16px">Have you confirmed that you&#39;re getting duplicate r=
ecords via other library transforms (such as applying=C2=A0<font id=3D"m_-1=
05207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_1470678013=
649_19354" face=3D"monospace, monospace">Count.globally()</font>=C2=A0to=C2=
=A0<font id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yui_=
3_16_0_1_1470678013649_19355" face=3D"monospace, monospace">k afkarecords</=
font>)?</div><div id=3D"m_-105207906128589263yiv8746533902m_-71353877010507=
98741yui_3_16_0_1_1470678013649_19353" style=3D"font-size:16px">=3D=3D&gt;&=
gt;No duplicates from Kafka.</div><div id=3D"m_-105207906128589263yiv874653=
3902m_-7135387701050798741yui_3_16_0_1_1470678013649_19356" style=3D"font-s=
ize:16px">------------------------------ ------------------------------ ---=
--------------------------- ------------------------------ ----------------=
-------</div><span class=3D"m_-105207906128589263yiv8746533902"></span><div=
 id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1=
_1470678013649_19358"><span id=3D"m_-105207906128589263yiv8746533902m_-7135=
387701050798741yui_3_16_0_1_1470678013649_19359" style=3D"font-size:16px">A=
dditionally, I&#39;m not sure what you mean by &quot;</span><span id=3D"m_-=
105207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_147067801=
3649_19360" style=3D"font-size:16px">executes till a record lands on method=
&quot;</span></div><div id=3D"m_-105207906128589263yiv8746533902m_-71353877=
01050798741yui_3_16_0_1_1470678013649_19358"><span id=3D"m_-105207906128589=
263yiv8746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19840" sty=
le=3D"font-size:16px">=3D=3D&gt;&gt;Sorry for my confusing statement. Like =
I mentioned above, I expect each record coming from Kafka gets assigned to =
one instance of the inner class and therefore one instance of the pipeline =
executed it in parallel with others executing their own unique records.</sp=
an></div><div id=3D"m_-105207906128589263yiv8746533902m_-713538770105079874=
1yui_3_16_0_1_1470678013649_19361" style=3D"font-size:16px"><br clear=3D"no=
ne" id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yui_3_16_=
0_1_1470678013649_19362"></div><div id=3D"m_-105207906128589263yiv874653390=
2m_-7135387701050798741yui_3_16_0_1_1470678013649_19361" style=3D"font-size=
:16px"><div dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-71353877=
01050798741yui_3_16_0_1_1470678013649_19896">------------------------------=
 ------------------------------ ------------------------------ ------------=
------------------ -----------------------</div><div dir=3D"ltr" id=3D"m_-1=
05207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_1470678013=
649_19897"><br clear=3D"none" id=3D"m_-105207906128589263yiv8746533902m_-71=
35387701050798741yui_3_16_0_1_1470678013649_19898"></div></div><div id=3D"m=
_-105207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_1470678=
013649_19363" style=3D"font-size:16px"><div id=3D"m_-105207906128589263yiv8=
746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19364"><div id=3D=
"m_-105207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_14706=
78013649_19365"><span class=3D"m_-105207906128589263yiv8746533902"></span><=
div dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798=
741yui_3_16_0_1_1470678013649_19366">Additionally additionally, is this rep=
roducible if you execute with the=C2=A0<font id=3D"m_-105207906128589263yiv=
8746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19367" face=3D"m=
onospace, monospace">DirectRunner</font>?=C2=A0</div><div dir=3D"ltr" id=3D=
"m_-105207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_14706=
78013649_19366">=3D=3D&gt;&gt;I have not tried DirectRunner. Should I?=C2=
=A0</div><div dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-713538=
7701050798741yui_3_16_0_1_1470678013649_19366"><br clear=3D"none"></div><di=
v dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-713538770105079874=
1yui_3_16_0_1_1470678013649_19366">Thanks so much Thomas.</div></div></div>=
</div><div class=3D"m_-105207906128589263yiv8746533902m_-713538770105079874=
1qtdSeparateBR" dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-7135=
387701050798741yui_3_16_0_1_1470678013649_19529"><br clear=3D"none"><br cle=
ar=3D"none"></div><div class=3D"m_-105207906128589263yiv8746533902m_-713538=
7701050798741yahoo_quoted" id=3D"m_-105207906128589263yiv8746533902m_-71353=
87701050798741yui_3_16_0_1_1470678013649_19320" style=3D"display:block">  <=
div id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yui_3_16_=
0_1_1470678013649_19319" style=3D"font-family:HelveticaNeue,Helvetica Neue,=
Helvetica,Arial,Lucida Grande,sans-serif;font-size:12px"> <div id=3D"m_-105=
207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_147067801364=
9_19318" style=3D"font-family:HelveticaNeue,Helvetica Neue,Helvetica,Arial,=
Lucida Grande,sans-serif;font-size:16px"> <div dir=3D"ltr" id=3D"m_-1052079=
06128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19=
317"> <font id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741y=
ui_3_16_0_1_1470678013649_19645" size=3D"2" face=3D"Arial"> </font><hr id=
=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_14=
70678013649_19644" size=3D"1"> <b id=3D"m_-105207906128589263yiv8746533902m=
_-7135387701050798741yui_3_16_0_1_1470678013649_19869"><span id=3D"m_-10520=
7906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_=
19868" style=3D"font-weight:bold">From:</span></b> Thomas Groh &lt;<a rel=
=3D"nofollow" shape=3D"rect" href=3D"mailto:tgroh@google.com" target=3D"_bl=
ank">tgroh@google.com</a>&gt;<br clear=3D"none"> <b><span style=3D"font-wei=
ght:bold">To:</span></b> <a rel=3D"nofollow" shape=3D"rect" href=3D"mailto:=
user@beam.incubator.apache.org" target=3D"_blank">user@beam.incubator.apach=
e.org</a> ; amir bahmanyari &lt;<a rel=3D"nofollow" shape=3D"rect" href=3D"=
mailto:amirtousa@yahoo.com" target=3D"_blank">amirtousa@yahoo.com</a>&gt; <=
br clear=3D"none"> <b><span style=3D"font-weight:bold">Sent:</span></b> Mon=
day, August 8, 2016 11:43 AM<br clear=3D"none"> <b><span style=3D"font-weig=
ht:bold">Subject:</span></b> Re: Is Beam pipeline runtime behavior inconsis=
tent?<br clear=3D"none">  </div> <div class=3D"m_-105207906128589263yiv8746=
533902m_-7135387701050798741y_msg_container" id=3D"m_-105207906128589263yiv=
8746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19489"><br clear=
=3D"none"><div id=3D"m_-105207906128589263yiv8746533902m_-71353877010507987=
41yiv9997091429"><div id=3D"m_-105207906128589263yiv8746533902m_-7135387701=
050798741yui_3_16_0_1_1470678013649_19488"><div><div class=3D"m_-1052079061=
28589263yiv8746533902h5"><div dir=3D"ltr" id=3D"m_-105207906128589263yiv874=
6533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19487"><div id=3D"m=
_-105207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_1470678=
013649_19486">Just to make sure I understand the problem:<br clear=3D"none"=
></div><div id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741y=
ui_3_16_0_1_1470678013649_19530"><br clear=3D"none"></div><div>You have a s=
pecific collection of records stored in Kafka. You run your pipeline, and o=
bserve duplicate elements. Is that accurate?</div><div id=3D"m_-10520790612=
8589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19490"=
><br clear=3D"none"></div><div id=3D"m_-105207906128589263yiv8746533902m_-7=
135387701050798741yui_3_16_0_1_1470678013649_19532">Have you confirmed that=
 you&#39;re getting duplicate records via other library transforms (such as=
 applying <font id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798=
741yui_3_16_0_1_1470678013649_19531" face=3D"monospace, monospace">Count.gl=
obally()</font> to <font face=3D"monospace, monospace">kafkarecords</font>)=
?</div><div><br clear=3D"none"></div>Additionally, I&#39;m not sure what yo=
u mean by &quot;<span>executes till a record lands on method&quot;</span><d=
iv id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0=
_1_1470678013649_19491"><br clear=3D"none"></div><div id=3D"m_-105207906128=
589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19536">=
<div id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yui_3_16=
_0_1_1470678013649_19535"><div id=3D"m_-105207906128589263yiv8746533902m_-7=
135387701050798741yui_3_16_0_1_1470678013649_19534"><div id=3D"m_-105207906=
128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_1953=
3">Additionally additionally, is this reproducible if you execute with the =
<font face=3D"monospace, monospace">DirectRunner</font>?<br clear=3D"none">=
</div></div></div></div><div id=3D"m_-105207906128589263yiv8746533902m_-713=
5387701050798741yui_3_16_0_1_1470678013649_19537"><br clear=3D"none"></div>=
</div></div></div><div class=3D"m_-105207906128589263yiv8746533902m_-713538=
7701050798741yiv9997091429yqt8061191404" id=3D"m_-105207906128589263yiv8746=
533902m_-7135387701050798741yiv9997091429yqt63808"><div class=3D"m_-1052079=
06128589263yiv8746533902m_-7135387701050798741yiv9997091429gmail_extra" id=
=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_14=
70678013649_19496"><br clear=3D"none"><div class=3D"m_-105207906128589263yi=
v8746533902m_-7135387701050798741yiv9997091429gmail_quote" id=3D"m_-1052079=
06128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19=
495"><div><div class=3D"m_-105207906128589263yiv8746533902h5">On Sun, Aug 7=
, 2016 at 11:44 PM, amir bahmanyari <span dir=3D"ltr">&lt;<a rel=3D"nofollo=
w" shape=3D"rect" href=3D"mailto:amirtousa@yahoo.com" target=3D"_blank">ami=
rtousa@yahoo.com</a>&gt;</span> wrote:<br clear=3D"none"></div></div><block=
quote class=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yiv9=
997091429gmail_quote" id=3D"m_-105207906128589263yiv8746533902m_-7135387701=
050798741yui_3_16_0_1_1470678013649_19494" style=3D"margin:0 0 0 .8ex;borde=
r-left:1px #ccc solid;padding-left:1ex"><div id=3D"m_-105207906128589263yiv=
8746533902m_-7135387701050798741yui_3_16_0_1_1470678013649_19493"><div id=
=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yui_3_16_0_1_14=
70678013649_19492" style=3D"color:#000;background-color:#fff;font-family:He=
lveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif;font-si=
ze:12px"><div><div class=3D"m_-105207906128589263yiv8746533902h5"><div id=
=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yiv9997091429m_=
-5331451614461944584yui_3_16_0_ym19_1_1470160531947_181843">Hi Colleagues,<=
/div><div id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yiv=
9997091429m_-5331451614461944584yui_3_16_0_ym19_1_1470160531947_181843">I r=
efrained from posting this email before completing thorough testing.</div><=
div dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798=
741yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_1470160531947_18184=
3">I think I did.</div><div dir=3D"ltr" id=3D"m_-105207906128589263yiv87465=
33902m_-7135387701050798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym=
19_1_1470160531947_181843">My core code works perfect &amp; produces the ex=
pect result every single time without wrapping it with Beam KafkaIO to rece=
ive the data.</div><div dir=3D"ltr" id=3D"m_-105207906128589263yiv874653390=
2m_-7135387701050798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1=
_1470160531947_181843">Without KafkaIO, it receives the records from a flat=
 data file. I repeated it and it always produced the right result.</div><di=
v dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-713538770105079874=
1yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_1470160531947_181843"=
>With including a Beam KarkaIO and embedding exact same code in a anonymous=
 class running Beam pipelines, I get a different result every time I rerun =
it.</div><div dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-713538=
7701050798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_147016053=
1947_181843">Below is the snippet from where KafkaIO executes till a record=
 lands on method.</div><div dir=3D"ltr" id=3D"m_-105207906128589263yiv87465=
33902m_-7135387701050798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym=
19_1_1470160531947_181843">Kafka sends precise number of records. No duplic=
ates. all good.</div><div dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533=
902m_-7135387701050798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym19=
_1_1470160531947_181843">While executing in Beam, when the records are fini=
shed &amp; I expect a correct result, it always produces something differen=
t.=C2=A0</div><div dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-7=
135387701050798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_1470=
160531947_181843">Different in different runs.</div><div dir=3D"ltr" id=3D"=
m_-105207906128589263yiv8746533902m_-7135387701050798741yiv9997091429m_-533=
1451614461944584yui_3_16_0_ym19_1_1470160531947_181843">I appreciate sheddi=
ng light on this issue.=C2=A0 And thanks for your valuable time as always.<=
/div><div dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-7135387701=
050798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_1470160531947=
_181843">Amir-</div><div dir=3D"ltr" id=3D"m_-105207906128589263yiv87465339=
02m_-7135387701050798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_=
1_1470160531947_181843"><br clear=3D"none"></div><div dir=3D"ltr" id=3D"m_-=
105207906128589263yiv8746533902m_-7135387701050798741yiv9997091429m_-533145=
1614461944584yui_3_16_0_ym19_1_1470160531947_181843">public static synchron=
ized void main(String[] args) throws Exception {<br clear=3D"none"></div><d=
iv dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-71353877010507987=
41yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_1470160531947_181843=
"><br clear=3D"none"></div><div dir=3D"ltr" id=3D"m_-105207906128589263yiv8=
746533902m_-7135387701050798741yiv9997091429m_-5331451614461944584yui_3_16_=
0_ym19_1_1470160531947_182198">// Create Beam Options for the Flink Runner.=
</div><div dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-713538770=
1050798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_147016053194=
7_182199"><span id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798=
741yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_1470160531947_18220=
0" style=3D"white-space:pre-wrap">		</span>FlinkPipelineOptions options =3D=
 PipelineOptionsFactory.as( FlinkPipelineOptions.class);</div><div dir=3D"l=
tr" id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yiv999709=
1429m_-5331451614461944584yui_3_16_0_ym19_1_1470160531947_182201"><span id=
=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yiv9997091429m_=
-5331451614461944584yui_3_16_0_ym19_1_1470160531947_182202" style=3D"white-=
space:pre-wrap">		</span>// Set the Streaming engine as FlinkRunner</div><d=
iv dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-71353877010507987=
41yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_1470160531947_182203=
"><span id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yiv99=
97091429m_-5331451614461944584yui_3_16_0_ym19_1_1470160531947_182204" style=
=3D"white-space:pre-wrap">		</span>options.setRunner( FlinkPipelineRunner.c=
lass);</div><div dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-713=
5387701050798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_147016=
0531947_182205"><span id=3D"m_-105207906128589263yiv8746533902m_-7135387701=
050798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_1470160531947=
_182206" style=3D"white-space:pre-wrap">		</span>// This is a Streaming pro=
cess (as opposed to Batch=3Dfalse)</div><div dir=3D"ltr" id=3D"m_-105207906=
128589263yiv8746533902m_-7135387701050798741yiv9997091429m_-533145161446194=
4584yui_3_16_0_ym19_1_1470160531947_182207"><span id=3D"m_-1052079061285892=
63yiv8746533902m_-7135387701050798741yiv9997091429m_-5331451614461944584yui=
_3_16_0_ym19_1_1470160531947_182208" style=3D"white-space:pre-wrap">		</spa=
n>options.setStreaming(true);</div><div dir=3D"ltr" id=3D"m_-10520790612858=
9263yiv8746533902m_-7135387701050798741yiv9997091429m_-5331451614461944584y=
ui_3_16_0_ym19_1_1470160531947_182209"><span id=3D"m_-105207906128589263yiv=
8746533902m_-7135387701050798741yiv9997091429m_-5331451614461944584yui_3_16=
_0_ym19_1_1470160531947_182210" style=3D"white-space:pre-wrap">		</span>//C=
reate the DAG pipeline for parallel processing of independent LR records</d=
iv><div dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-713538770105=
0798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_1470160531947_1=
82211"><span id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741=
yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_1470160531947_182212" =
style=3D"white-space:pre-wrap">		</span>Pipeline p =3D Pipeline.create(opti=
ons);</div><div dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-7135=
387701050798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_1470160=
531947_182213"><span id=3D"m_-105207906128589263yiv8746533902m_-71353877010=
50798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_1470160531947_=
182214" style=3D"white-space:pre-wrap">		</span>//Kafka broker topic is ide=
ntified as &quot;lroad&quot;=C2=A0</div><div dir=3D"ltr" id=3D"m_-105207906=
128589263yiv8746533902m_-7135387701050798741yiv9997091429m_-533145161446194=
4584yui_3_16_0_ym19_1_1470160531947_182215"><span id=3D"m_-1052079061285892=
63yiv8746533902m_-7135387701050798741yiv9997091429m_-5331451614461944584yui=
_3_16_0_ym19_1_1470160531947_182216" style=3D"white-space:pre-wrap">		</spa=
n>List&lt;String&gt; topics =3D Arrays.asList(&quot;lroad&quot;);</div><div=
 dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741=
yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_1470160531947_182215">=
<br clear=3D"none"></div><div dir=3D"ltr" id=3D"m_-105207906128589263yiv874=
6533902m_-7135387701050798741yiv9997091429m_-5331451614461944584yui_3_16_0_=
ym19_1_1470160531947_182094">PCollection&lt;String&gt; kafkarecords =3D p.a=
pply(KafkaIO.read(). withBootstrapServers(&quot; kafkahost:9092&quot;).with=
Topics( topics).</div><div dir=3D"ltr" id=3D"m_-105207906128589263yiv874653=
3902m_-7135387701050798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym1=
9_1_1470160531947_182095"><span id=3D"m_-105207906128589263yiv8746533902m_-=
7135387701050798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_147=
0160531947_182096" style=3D"white-space:pre-wrap">					</span>withValueCode=
r( StringUtf8Coder.of()). withoutMetadata()).apply( Values.&lt;String&gt;cr=
eate()). apply(Window.&lt;String&gt;into( FixedWindows.of(Duration. standar=
dMinutes(1)))</div><div dir=3D"ltr" id=3D"m_-105207906128589263yiv874653390=
2m_-7135387701050798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1=
_1470160531947_182097"><span id=3D"m_-105207906128589263yiv8746533902m_-713=
5387701050798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_147016=
0531947_182098" style=3D"white-space:pre-wrap">			</span> =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0.triggering(AfterWatermark. pastEndOfWindow()). withAllowe=
dLateness(Duration. ZERO)</div><div dir=3D"ltr" id=3D"m_-105207906128589263=
yiv8746533902m_-7135387701050798741yiv9997091429m_-5331451614461944584yui_3=
_16_0_ym19_1_1470160531947_182099"><span id=3D"m_-105207906128589263yiv8746=
533902m_-7135387701050798741yiv9997091429m_-5331451614461944584yui_3_16_0_y=
m19_1_1470160531947_182100" style=3D"white-space:pre-wrap">					</span> =C2=
=A0 =C2=A0.accumulatingFiredPanes());</div><div dir=3D"ltr" id=3D"m_-105207=
906128589263yiv8746533902m_-7135387701050798741yiv9997091429m_-533145161446=
1944584yui_3_16_0_ym19_1_1470160531947_182101"><span id=3D"m_-1052079061285=
89263yiv8746533902m_-7135387701050798741yiv9997091429m_-5331451614461944584=
yui_3_16_0_ym19_1_1470160531947_182102" style=3D"white-space:pre-wrap">	</s=
pan> =C2=A0 =C2=A0 =C2=A0 =C2=A0</div></div></div><div dir=3D"ltr" id=3D"m_=
-105207906128589263yiv8746533902m_-7135387701050798741yiv9997091429m_-53314=
51614461944584yui_3_16_0_ym19_1_1470160531947_182103"><span id=3D"m_-105207=
906128589263yiv8746533902m_-7135387701050798741yiv9997091429m_-533145161446=
1944584yui_3_16_0_ym19_1_1470160531947_182104" style=3D"white-space:pre-wra=
p">	</span> =C2=A0 =C2=A0<span id=3D"m_-105207906128589263yiv8746533902m_-7=
135387701050798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_1470=
160531947_182105" style=3D"white-space:pre-wrap">	</span>kafkarecords.apply=
(ParDo. named(&quot;ProcessLRKafkaData&quot;). of(new DoFn&lt;String, Strin=
g&gt;() {</div><span class=3D"m_-105207906128589263yiv8746533902"></span><d=
iv dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-71353877010507987=
41yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_1470160531947_182103=
">=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0<br=
 clear=3D"none"></div><div dir=3D"ltr" id=3D"m_-105207906128589263yiv874653=
3902m_-7135387701050798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym1=
9_1_1470160531947_182103">=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0public void processElement(ProcessContext ctx) throws =
Exception {<br clear=3D"none"></div><div dir=3D"ltr" id=3D"m_-1052079061285=
89263yiv8746533902m_-7135387701050798741yiv9997091429m_-5331451614461944584=
yui_3_16_0_ym19_1_1470160531947_182103"><br clear=3D"none"></div><div dir=
=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yiv9=
997091429m_-5331451614461944584yui_3_16_0_ym19_1_1470160531947_182103">=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 <u id=3D"m_-=
105207906128589263yiv8746533902m_-7135387701050798741yiv9997091429m_-533145=
1614461944584yui_3_16_0_ym19_1_1470160531947_182306"><b id=3D"m_-1052079061=
28589263yiv8746533902m_-7135387701050798741yiv9997091429m_-5331451614461944=
584yui_3_16_0_ym19_1_1470160531947_182305">My core logic code here.</b></u>=
<br clear=3D"none"></div><div dir=3D"ltr" id=3D"m_-105207906128589263yiv874=
6533902m_-7135387701050798741yiv9997091429m_-5331451614461944584yui_3_16_0_=
ym19_1_1470160531947_182103">}));</div><div dir=3D"ltr" id=3D"m_-1052079061=
28589263yiv8746533902m_-7135387701050798741yiv9997091429m_-5331451614461944=
584yui_3_16_0_ym19_1_1470160531947_182103">.</div><div dir=3D"ltr" id=3D"m_=
-105207906128589263yiv8746533902m_-7135387701050798741yiv9997091429m_-53314=
51614461944584yui_3_16_0_ym19_1_1470160531947_182103">.</div><div dir=3D"lt=
r" id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yiv9997091=
429m_-5331451614461944584yui_3_16_0_ym19_1_1470160531947_182103">p.run(); /=
/ Start Beam Pipeline(s) in FlinkC Cluster<br clear=3D"none"></div><div dir=
=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-7135387701050798741yiv9=
997091429m_-5331451614461944584yui_3_16_0_ym19_1_1470160531947_182364">} //=
 of main</div><div dir=3D"ltr" id=3D"m_-105207906128589263yiv8746533902m_-7=
135387701050798741yiv9997091429m_-5331451614461944584yui_3_16_0_ym19_1_1470=
160531947_182365">}// of class</div></div></div></blockquote></div><br clea=
r=3D"none"></div></div></div></div><br clear=3D"none"><br clear=3D"none"></=
div> </div> </div>  </div></div></div></div></div></blockquote></div><br cl=
ear=3D"none"></div></div></div></div><br><br></div> </div> </div>  </div></=
div></div></blockquote></div><br></div>

--94eb2c0565c8f53e980539973f66--