Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flink.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAGr9p8A3+yZUcWH+PdPubPv6GnDE_odGwa3+fv7NQjVKgsf-Cg@mail.gmail.com>
References: 
 <CAB6CeiZ+-97-f=CEiL0ybB37UqB3v_gcEWOkYcPJ9L8xiAGSLg@mail.gmail.com>
	<89CE1FB5-241D-4605-A83B-F9774A32C878@gmail.com>
	<CAB6CeiYAHAPMpvK_U7QKNa6XJ16-75unQkbU1QMoXkHaWNxgZQ@mail.gmail.com>
	<CAGr9p8A3+yZUcWH+PdPubPv6GnDE_odGwa3+fv7NQjVKgsf-Cg@mail.gmail.com>
Date: Thu, 21 Jan 2016 16:51:03 -0600
Message-ID: 
 <CAB6CeiajDDGu-mPUaWQNH5JiuRnBrUp_WfUE8hEmE_UPA4YcNQ@mail.gmail.com>
Subject: Re: DeserializationSchema isEndOfStream usage?
From: David Kim <david.kim@braintreepayments.com>
To: user@flink.apache.org
Content-Type: multipart/alternative; boundary=001a113aa2aef27c830529dfef9e

--001a113aa2aef27c830529dfef9e
Content-Type: text/plain; charset=UTF-8

Hi Robert!

Thanks for reaching out. I ran into an issue and wasn't sure if this was
due to a misconfiguration on my end of if this is a real bug. I have one
DataStream and I'm sinking to two different kafka sinks. When the job
starts, I run into this error:

org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:659)
at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:605)
at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:605)
at
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.UnsupportedOperationException: The accumulator
'producer-record-retry-rate' already exists and cannot be added.
at
org.apache.flink.api.common.functions.util.AbstractRuntimeUDFContext.addAccumulator(AbstractRuntimeUDFContext.java:121)
at
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.open(FlinkKafkaProducerBase.java:204)
at
org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:89)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:305)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:227)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:567)
at java.lang.Thread.run(Thread.java:745)


The particular accumulator the exception is complains about changes,
meaning it's not always 'producer-record-retry-rate' -- most likely due to
the non-deterministic ordering of the collection. Any guidance appreciated!

I'm using 1.0-SNAPSHOT and my two sinks are FlinkKafkaProducer08.

The flink code looks something like this:


val stream: DataStream[Foo] = ...

val kafkaA = new FlinkKafkaProducer08[Foo]...

val kafkaB = new FlinkKafkaProducer08[Foo]...


stream
  .addSink(kafkaA)

stream.
  .addSink(kafkaB)


Thanks,
David

On Wed, Jan 20, 2016 at 1:34 PM, Robert Metzger <rmetzger@apache.org> wrote:

> I've now merged the pull request. DeserializationSchema.isEndOfStream()
> should now be evaluated correctly by the Kafka 0.9 and 0.8 connectors.
>
> Please let me know if the updated code has any issues. I'll fix the issues
> asap.
>
> On Wed, Jan 13, 2016 at 5:06 PM, David Kim <
> david.kim@braintreepayments.com> wrote:
>
>> Thanks Robert! I'll be keeping tabs on the PR.
>>
>> Cheers,
>> David
>>
>> On Mon, Jan 11, 2016 at 4:04 PM, Robert Metzger <metrobert@gmail.com>
>> wrote:
>>
>>> Hi David,
>>>
>>> In theory isEndOfStream() is absolutely the right way to go for stopping
>>> data sources in Flink.
>>> That its not working as expected is a bug. I have a pending pull request
>>> for adding a Kafka 0.9 connector, which fixes this issue as well (for all
>>> supported Kafka versions).
>>>
>>> Sorry for the inconvenience. If you want, you can check out the branch
>>> of the PR and build Flink yourself to get the fix.
>>> I hope that I can merge the connector to master this week, then, the fix
>>> will be available in 1.0-SNAPSHOT as well.
>>>
>>> Regards,
>>> Robert
>>>
>>>
>>>
>>> Sent from my iPhone
>>>
>>> On 11.01.2016, at 21:39, David Kim <david.kim@braintreepayments.com>
>>> wrote:
>>>
>>> Hello all,
>>>
>>> I saw that DeserializationSchema has an API "isEndOfStream()".
>>>
>>>
>>> https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/util/serialization/DeserializationSchema.java
>>>
>>> Can *isEndOfStream* be utilized to somehow terminate a streaming flink
>>> job?
>>>
>>> I was under the impression that if we return "true" we can control when
>>> a stream can close. The use case I had in mind was controlling when
>>> unit/integration tests would terminate a flink job. We can rely on the fact
>>> that a test/spec would know how many items it expects to consume and then
>>> switch *isEndOfStream* to return true.
>>>
>>> Am I misunderstanding the intention for *isEndOfStream*?
>>>
>>> I also set a breakpoint on *isEndOfStream* and saw that it never was
>>> hit when using "FlinkKafkaConsumer082" to pass in a DeserializationSchema
>>> implementation.
>>>
>>> Currently testing on 1.0-SNAPSHOT.
>>>
>>> Cheers!
>>> David
>>>
>>>
>>
>>
>> --
>> Note: this information is confidential. It is prohibited to share, post
>> online or otherwise publicize without Braintree's prior written consent.
>>
>
>


-- 
Note: this information is confidential. It is prohibited to share, post
online or otherwise publicize without Braintree's prior written consent.

--001a113aa2aef27c830529dfef9e
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Robert!<div><br></div><div>Thanks for reaching out. I r=
an into an issue and wasn&#39;t sure if this was due to a misconfiguration =
on my end of if this is a real bug. I have one DataStream and I&#39;m sinki=
ng to two different kafka sinks. When the job starts, I run into this error=
:</div><div><br></div><div><div><font face=3D"monospace, monospace">org.apa=
che.flink.runtime.client.JobExecutionException: Job execution failed.</font=
></div><div><font face=3D"monospace, monospace"><span style=3D"white-space:=
pre-wrap">	</span>at org.apache.flink.runtime.jobmanager.JobManager$$anonfu=
n$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:659)=
</font></div><div><font face=3D"monospace, monospace"><span style=3D"white-=
space:pre-wrap">	</span>at org.apache.flink.runtime.jobmanager.JobManager$$=
anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:605)<=
/font></div><div><font face=3D"monospace, monospace"><span style=3D"white-s=
pace:pre-wrap">	</span>at org.apache.flink.runtime.jobmanager.JobManager$$a=
nonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:605)</=
font></div><div><font face=3D"monospace, monospace"><span style=3D"white-sp=
ace:pre-wrap">	</span>at scala.concurrent.impl.Future$PromiseCompletingRunn=
able.liftedTree1$1(Future.scala:24)</font></div><div><font face=3D"monospac=
e, monospace"><span style=3D"white-space:pre-wrap">	</span>at scala.concurr=
ent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)</font></div>=
<div><font face=3D"monospace, monospace"><span style=3D"white-space:pre-wra=
p">	</span>at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)=
</font></div><div><font face=3D"monospace, monospace"><span style=3D"white-=
space:pre-wrap">	</span>at akka.dispatch.ForkJoinExecutorConfigurator$AkkaF=
orkJoinTask.exec(AbstractDispatcher.scala:401)</font></div><div><font face=
=3D"monospace, monospace"><span style=3D"white-space:pre-wrap">	</span>at s=
cala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)</font><=
/div><div><font face=3D"monospace, monospace"><span style=3D"white-space:pr=
e-wrap">	</span>at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask=
(ForkJoinPool.java:1339)</font></div><div><font face=3D"monospace, monospac=
e"><span style=3D"white-space:pre-wrap">	</span>at scala.concurrent.forkjoi=
n.ForkJoinPool.runWorker(ForkJoinPool.java:1979)</font></div><div><font fac=
e=3D"monospace, monospace"><span style=3D"white-space:pre-wrap">	</span>at =
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.jav=
a:107)</font></div><div><font face=3D"monospace, monospace" color=3D"#cc000=
0">Caused by: java.lang.UnsupportedOperationException: The accumulator &#39=
;producer-record-retry-rate&#39; already exists and cannot be added.</font>=
</div><div><font face=3D"monospace, monospace"><span style=3D"white-space:p=
re-wrap">	</span>at org.apache.flink.api.common.functions.util.AbstractRunt=
imeUDFContext.addAccumulator(AbstractRuntimeUDFContext.java:121)</font></di=
v><div><font face=3D"monospace, monospace"><span style=3D"white-space:pre-w=
rap">	</span>at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProdu=
cerBase.open(FlinkKafkaProducerBase.java:204)</font></div><div><font face=
=3D"monospace, monospace"><span style=3D"white-space:pre-wrap">	</span>at o=
rg.apache.flink.api.common.functions.util.FunctionUtils.openFunction(Functi=
onUtils.java:36)</font></div><div><font face=3D"monospace, monospace"><span=
 style=3D"white-space:pre-wrap">	</span>at org.apache.flink.streaming.api.o=
perators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:89)<=
/font></div><div><font face=3D"monospace, monospace"><span style=3D"white-s=
pace:pre-wrap">	</span>at org.apache.flink.streaming.runtime.tasks.StreamTa=
sk.openAllOperators(StreamTask.java:305)</font></div><div><font face=3D"mon=
ospace, monospace"><span style=3D"white-space:pre-wrap">	</span>at org.apac=
he.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:227)</fo=
nt></div><div><font face=3D"monospace, monospace"><span style=3D"white-spac=
e:pre-wrap">	</span>at org.apache.flink.runtime.taskmanager.Task.run(Task.j=
ava:567)</font></div><div><font face=3D"monospace, monospace"><span style=
=3D"white-space:pre-wrap">	</span>at java.lang.Thread.run(Thread.java:745)<=
/font></div></div><div><br></div><div><br></div><div>The particular accumul=
ator the exception is complains about changes, meaning it&#39;s not always =
&#39;producer-record-retry-rate&#39; -- most likely due to the non-determin=
istic ordering of the collection. Any guidance appreciated!</div><div><br><=
/div><div>I&#39;m using 1.0-SNAPSHOT and my two sinks are=C2=A0FlinkKafkaPr=
oducer08.</div><div><br></div><div>The flink code looks something like this=
:</div><div><br></div><div><br></div><blockquote style=3D"margin:0 0 0 40px=
;border:none;padding:0px"><div><font face=3D"monospace, monospace">val stre=
am: DataStream[Foo] =3D ...</font></div></blockquote><blockquote style=3D"m=
argin:0 0 0 40px;border:none;padding:0px"><div><font face=3D"monospace, mon=
ospace">val kafkaA =3D new FlinkKafkaProducer08[Foo]...</font></div></block=
quote><blockquote style=3D"margin:0 0 0 40px;border:none;padding:0px"><div>=
<font face=3D"monospace, monospace">val kafkaB =3D new FlinkKafkaProducer08=
[Foo]...</font></div></blockquote><blockquote style=3D"margin:0 0 0 40px;bo=
rder:none;padding:0px"><div><font face=3D"monospace, monospace"><br></font>=
</div><div><font face=3D"monospace, monospace">stream</font></div><div><fon=
t face=3D"monospace, monospace">=C2=A0 .addSink(kafkaA)</font></div><div><f=
ont face=3D"monospace, monospace"><br></font></div><div><font face=3D"monos=
pace, monospace">stream.</font></div><div><font face=3D"monospace, monospac=
e">=C2=A0 .addSink(kafkaB)</font></div></blockquote><div><br></div><div>Tha=
nks,</div><div>David</div></div><div class=3D"gmail_extra"><br><div class=
=3D"gmail_quote">On Wed, Jan 20, 2016 at 1:34 PM, Robert Metzger <span dir=
=3D"ltr">&lt;<a href=3D"mailto:rmetzger@apache.org" target=3D"_blank">rmetz=
ger@apache.org</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" s=
tyle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div=
 dir=3D"ltr">I&#39;ve now merged the pull request.=C2=A0<span style=3D"font=
-size:12.8px">DeserializationSchema.isEndOfStream() should now be evaluated=
 correctly by the Kafka 0.9 and 0.8 connectors.</span><div><span style=3D"f=
ont-size:12.8px"><br></span></div><div><span style=3D"font-size:12.8px">Ple=
ase let me know if the updated code has any issues. I&#39;ll fix the issues=
 asap.</span></div></div><div class=3D"HOEnZb"><div class=3D"h5"><div class=
=3D"gmail_extra"><br><div class=3D"gmail_quote">On Wed, Jan 13, 2016 at 5:0=
6 PM, David Kim <span dir=3D"ltr">&lt;<a href=3D"mailto:david.kim@braintree=
payments.com" target=3D"_blank">david.kim@braintreepayments.com</a>&gt;</sp=
an> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;=
border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">Thanks Robert=
! I&#39;ll be keeping tabs on the PR.<div><br></div><div>Cheers,</div><div>=
David</div></div><div class=3D"gmail_extra"><div><div><br><div class=3D"gma=
il_quote">On Mon, Jan 11, 2016 at 4:04 PM, Robert Metzger <span dir=3D"ltr"=
>&lt;<a href=3D"mailto:metrobert@gmail.com" target=3D"_blank">metrobert@gma=
il.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"=
margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"=
auto"><div>Hi David,</div><div><br></div><div>In theory isEndOfStream() is =
absolutely the right way to go for stopping data sources in Flink.</div><di=
v>That its not working as expected is a bug. I have a pending pull request =
for adding a Kafka 0.9 connector, which fixes this issue as well (for all s=
upported Kafka versions).</div><div><br></div><div>Sorry for the inconvenie=
nce. If you want, you can check out the branch of the PR and build Flink yo=
urself to get the fix.</div><div>I hope that I can merge the connector to m=
aster this week, then, the fix will be available in 1.0-SNAPSHOT as well.</=
div><div><br></div><div>Regards,</div><div>Robert</div><div><br></div><div>=
<br><br>Sent from my iPhone</div><div><div><div><br>On 11.01.2016, at 21:39=
, David Kim &lt;<a href=3D"mailto:david.kim@braintreepayments.com" target=
=3D"_blank">david.kim@braintreepayments.com</a>&gt; wrote:<br><br></div><bl=
ockquote type=3D"cite"><div><div dir=3D"ltr">Hello all,<div><br></div><div>=
I saw that DeserializationSchema has an API &quot;isEndOfStream()&quot;.=C2=
=A0</div><div><br></div><div><a href=3D"https://github.com/apache/flink/blo=
b/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/util=
/serialization/DeserializationSchema.java" target=3D"_blank">https://github=
.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache=
/flink/streaming/util/serialization/DeserializationSchema.java</a></div><di=
v><br></div><div>Can <b>isEndOfStream</b> be utilized to somehow terminate =
a streaming flink job?</div><div><br></div><div>I was under the impression =
that if we return &quot;true&quot; we can control when a stream can close. =
The use case I had in mind was controlling when unit/integration tests woul=
d terminate a flink job. We can rely on the fact that a test/spec would kno=
w how many items it expects to consume and then switch <b>isEndOfStream</b>=
 to return true.</div><div><br></div><div>Am I misunderstanding the intenti=
on for <b>isEndOfStream</b>?=C2=A0<br></div><div><br></div><div>I also set =
a breakpoint on <b>isEndOfStream</b> and saw that it never was hit when usi=
ng &quot;FlinkKafkaConsumer082&quot; to pass in a DeserializationSchema imp=
lementation.</div><div><br></div><div>Currently testing on 1.0-SNAPSHOT.</d=
iv><div><br></div><div>Cheers!</div><div>David
</div></div>
</div></blockquote></div></div></div></blockquote></div><br><br clear=3D"al=
l"><div><br></div></div></div><span><font color=3D"#888888">-- <br><div><di=
v dir=3D"ltr"><span style=3D"color:rgb(77,77,77);font-family:&#39;Helvetica=
 Neue&#39;,Arial,Helvetica,sans-serif;font-size:14px;line-height:18px">Note=
: this information is confidential. It is prohibited to share, post online =
or otherwise publicize without Braintree&#39;s prior written consent.</span=
><br></div></div>
</font></span></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
<div class=3D"gmail_signature"><div dir=3D"ltr"><span style=3D"color:rgb(77=
,77,77);font-family:&#39;Helvetica Neue&#39;,Arial,Helvetica,sans-serif;fon=
t-size:14px;line-height:18px">Note: this information is confidential. It is=
 prohibited to share, post online or otherwise publicize without Braintree&=
#39;s prior written consent.</span><br></div></div>
</div>

--001a113aa2aef27c830529dfef9e--