flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: Different results on local and on cluster
Date Mon, 04 Jul 2016 10:34:53 GMT
Sorry I wanted to write Kryo but I'm on my mobile....
On 4 Jul 2016 12:34 p.m., "Flavio Pompermaier" <pompermaier@okkam.it> wrote:

> Because I don't  see any good reason for that...maybe  also all keyo
> serialization errors that  I have from time to time could be symptomatic of
> some other error in how Flink manage the ibternal buffers...but also this
> is just another personal guess I did..
> On 4 Jul 2016 12:29 p.m., "Ufuk Celebi" <uce@apache.org> wrote:
>
>> It's not possible to tell. You would have to look into the logs of the
>> job manager to check what happened. The not killed task manager could
>> have re-connected to the job manager, if it was restarted quickly
>> after the failure. Why do you think that the task manager would
>> influence the job result though?
>>
>> On Mon, Jul 4, 2016 at 12:23 PM, Flavio Pompermaier
>> <pompermaier@okkam.it> wrote:
>> > No, I haven't.
>> > I fear that unkilled taskmanger could have been the cause of this
>> problem.
>> > Last day I run the job and I discovered that on some node there was some
>> > zombie taskmanger yhat wasn't terminated during the stop-cluster.
>> > What do you think?What happens in this situations?old taskmanager are
>> still
>> > avle to interfer with the new jobmanager?
>> > in the webdashboard I didn't  see them so I thought it wasn't
>> problematic
>> > at all so I just killed them..
>> >
>> > On 4 Jul 2016 12:07 p.m., "Ufuk Celebi" <uce@apache.org> wrote:
>> >
>> > I guess Aljoscha was referring to whether you also have broadcasted
>> > input or something like it?
>> >
>> > On Fri, Jul 1, 2016 at 7:05 PM, Flavio Pompermaier <
>> pompermaier@okkam.it>
>> > wrote:
>> >> what do you mean exactly?
>> >>
>> >> On 1 Jul 2016 18:58, "Aljoscha Krettek" <aljoscha@apache.org> wrote:
>> >>>
>> >>> Hi,
>> >>> do you have any data in the coGroup/groupBy operators that you use,
>> >>> besides the input data?
>> >>>
>> >>> Cheers,
>> >>> Aljoscha
>> >>>
>> >>> On Fri, 1 Jul 2016 at 14:17 Flavio Pompermaier <pompermaier@okkam.it>
>> >>> wrote:
>> >>>>
>> >>>> Hi to all,
>> >>>> I have a Flink job that computes data correctly when launched locally
>> >>>> from my IDE while it doesn't when launched on the cluster.
>> >>>>
>> >>>> Is there any suggestion/example to understand the problematic
>> operators
>> >>>> in this way?
>> >>>> I think the root cause is the fact that some operator (e.g.
>> >>>> coGroup/groupBy,etc), which I assume to have all the data for a
key,
>> >>>> maybe
>> >>>> it is not (because the data is partitioned among nodes).
>> >>>>
>> >>>> Any help is appreciated,
>> >>>> Flavio
>>
>

Mime
View raw message