flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: Different results on local and on cluster
Date Mon, 04 Jul 2016 10:34:09 GMT
Because I don't  see any good reason for that...maybe  also all keyo
serialization errors that  I have from time to time could be symptomatic of
some other error in how Flink manage the ibternal buffers...but also this
is just another personal guess I did..
On 4 Jul 2016 12:29 p.m., "Ufuk Celebi" <uce@apache.org> wrote:

> It's not possible to tell. You would have to look into the logs of the
> job manager to check what happened. The not killed task manager could
> have re-connected to the job manager, if it was restarted quickly
> after the failure. Why do you think that the task manager would
> influence the job result though?
>
> On Mon, Jul 4, 2016 at 12:23 PM, Flavio Pompermaier
> <pompermaier@okkam.it> wrote:
> > No, I haven't.
> > I fear that unkilled taskmanger could have been the cause of this
> problem.
> > Last day I run the job and I discovered that on some node there was some
> > zombie taskmanger yhat wasn't terminated during the stop-cluster.
> > What do you think?What happens in this situations?old taskmanager are
> still
> > avle to interfer with the new jobmanager?
> > in the webdashboard I didn't  see them so I thought it wasn't
> problematic
> > at all so I just killed them..
> >
> > On 4 Jul 2016 12:07 p.m., "Ufuk Celebi" <uce@apache.org> wrote:
> >
> > I guess Aljoscha was referring to whether you also have broadcasted
> > input or something like it?
> >
> > On Fri, Jul 1, 2016 at 7:05 PM, Flavio Pompermaier <pompermaier@okkam.it
> >
> > wrote:
> >> what do you mean exactly?
> >>
> >> On 1 Jul 2016 18:58, "Aljoscha Krettek" <aljoscha@apache.org> wrote:
> >>>
> >>> Hi,
> >>> do you have any data in the coGroup/groupBy operators that you use,
> >>> besides the input data?
> >>>
> >>> Cheers,
> >>> Aljoscha
> >>>
> >>> On Fri, 1 Jul 2016 at 14:17 Flavio Pompermaier <pompermaier@okkam.it>
> >>> wrote:
> >>>>
> >>>> Hi to all,
> >>>> I have a Flink job that computes data correctly when launched locally
> >>>> from my IDE while it doesn't when launched on the cluster.
> >>>>
> >>>> Is there any suggestion/example to understand the problematic
> operators
> >>>> in this way?
> >>>> I think the root cause is the fact that some operator (e.g.
> >>>> coGroup/groupBy,etc), which I assume to have all the data for a key,
> >>>> maybe
> >>>> it is not (because the data is partitioned among nodes).
> >>>>
> >>>> Any help is appreciated,
> >>>> Flavio
>

Mime
View raw message