flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: Job hangs
Date Tue, 26 Apr 2016 15:11:14 GMT
Can you please further provide the execution plan via

env.getExecutionPlan()



On Tue, Apr 26, 2016 at 4:23 PM, Timur Fayruzov
<timur.fairuzov@gmail.com> wrote:
> Hello Robert,
>
> I observed progress for 2 hours(meaning numbers change on dashboard), and
> then I waited for 2 hours more. I'm sure it had to spill at some point, but
> I figured 2h is enough time.
>
> Thanks,
> Timur
>
> On Apr 26, 2016 1:35 AM, "Robert Metzger" <rmetzger@apache.org> wrote:
>>
>> Hi Timur,
>>
>> thank you for sharing the source code of your job. That is helpful!
>> Its a large pipeline with 7 joins and 2 co-groups. Maybe your job is much
>> more IO heavy with the larger input data because all the joins start
>> spilling?
>> Our monitoring, in particular for batch jobs is really not very advanced..
>> If we had some monitoring showing the spill status, we would maybe see that
>> the job is still running.
>>
>> How long did you wait until you declared the job hanging?
>>
>> Regards,
>> Robert
>>
>>
>> On Tue, Apr 26, 2016 at 10:11 AM, Ufuk Celebi <uce@apache.org> wrote:
>>>
>>> No.
>>>
>>> If you run on YARN, the YARN logs are the relevant ones for the
>>> JobManager and TaskManager. The client log submitting the job should
>>> be found in /log.
>>>
>>> – Ufuk
>>>
>>> On Tue, Apr 26, 2016 at 10:06 AM, Timur Fayruzov
>>> <timur.fairuzov@gmail.com> wrote:
>>> > I will do it my tomorrow. Logs don't show anything unusual. Are there
>>> > any
>>> > logs besides what's in flink/log and yarn container logs?
>>> >
>>> > On Apr 26, 2016 1:03 AM, "Ufuk Celebi" <uce@apache.org> wrote:
>>> >
>>> > Hey Timur,
>>> >
>>> > is it possible to connect to the VMs and get stack traces of the Flink
>>> > processes as well?
>>> >
>>> > We can first have a look at the logs, but the stack traces will be
>>> > helpful if we can't figure out what the issue is.
>>> >
>>> > – Ufuk
>>> >
>>> > On Tue, Apr 26, 2016 at 9:42 AM, Till Rohrmann <trohrmann@apache.org>
>>> > wrote:
>>> >> Could you share the logs with us, Timur? That would be very helpful.
>>> >>
>>> >> Cheers,
>>> >> Till
>>> >>
>>> >> On Apr 26, 2016 3:24 AM, "Timur Fayruzov" <timur.fairuzov@gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>> Now I'm at the stage where my job seem to completely hang. Source
>>> >>> code is
>>> >>> attached (it won't compile but I think gives a very good idea of
what
>>> >>> happens). Unfortunately I can't provide the datasets. Most of them
>>> >>> are
>>> >>> about
>>> >>> 100-500MM records, I try to process on EMR cluster with 40 tasks
6GB
>>> >>> memory
>>> >>> for each.
>>> >>>
>>> >>> It was working for smaller input sizes. Any idea on what I can do
>>> >>> differently is appreciated.
>>> >>>
>>> >>> Thans,
>>> >>> Timur
>>
>>
>

Mime
View raw message