flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Job hangs
Date Tue, 26 Apr 2016 08:35:08 GMT
Hi Timur,

thank you for sharing the source code of your job. That is helpful!
Its a large pipeline with 7 joins and 2 co-groups. Maybe your job is much
more IO heavy with the larger input data because all the joins start
spilling?
Our monitoring, in particular for batch jobs is really not very advanced..
If we had some monitoring showing the spill status, we would maybe see that
the job is still running.

How long did you wait until you declared the job hanging?

Regards,
Robert


On Tue, Apr 26, 2016 at 10:11 AM, Ufuk Celebi <uce@apache.org> wrote:

> No.
>
> If you run on YARN, the YARN logs are the relevant ones for the
> JobManager and TaskManager. The client log submitting the job should
> be found in /log.
>
> – Ufuk
>
> On Tue, Apr 26, 2016 at 10:06 AM, Timur Fayruzov
> <timur.fairuzov@gmail.com> wrote:
> > I will do it my tomorrow. Logs don't show anything unusual. Are there any
> > logs besides what's in flink/log and yarn container logs?
> >
> > On Apr 26, 2016 1:03 AM, "Ufuk Celebi" <uce@apache.org> wrote:
> >
> > Hey Timur,
> >
> > is it possible to connect to the VMs and get stack traces of the Flink
> > processes as well?
> >
> > We can first have a look at the logs, but the stack traces will be
> > helpful if we can't figure out what the issue is.
> >
> > – Ufuk
> >
> > On Tue, Apr 26, 2016 at 9:42 AM, Till Rohrmann <trohrmann@apache.org>
> wrote:
> >> Could you share the logs with us, Timur? That would be very helpful.
> >>
> >> Cheers,
> >> Till
> >>
> >> On Apr 26, 2016 3:24 AM, "Timur Fayruzov" <timur.fairuzov@gmail.com>
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> Now I'm at the stage where my job seem to completely hang. Source code
> is
> >>> attached (it won't compile but I think gives a very good idea of what
> >>> happens). Unfortunately I can't provide the datasets. Most of them are
> >>> about
> >>> 100-500MM records, I try to process on EMR cluster with 40 tasks 6GB
> >>> memory
> >>> for each.
> >>>
> >>> It was working for smaller input sizes. Any idea on what I can do
> >>> differently is appreciated.
> >>>
> >>> Thans,
> >>> Timur
>

Mime
View raw message