flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frederick Ayala <frederickay...@gmail.com>
Subject Re: akka.pattern.AskTimeoutException
Date Fri, 15 Jan 2016 11:41:31 GMT
Hi Stephan,

Other jobs run fine but this one is not working on the machine that I was
using previously (16GB RAM) [1]

Is there a way to debug the Akka messages to understand what's happening
between the JobManager and the Client? I can add logging and send it.

Thanks!

Fred

[1] The failure started to happen when I added the flatMap transformation.
Previously I was calling the collect function after the reduceGroup and
then using Scala's flatten function. Since this was very slow and failed
with large datafile I used Flink to flatten the list of lists and now it
faster.
On Jan 15, 2016 11:51, "Stephan Ewen" <sewen@apache.org> wrote:

> Hi!
>
> Do you get this problem with other Jobs as well?
>
> The logs suggest that the JobManager receives the job and starts tasks,
> but the Client thinks it lost connection.
>
> Greetings,
> Stephan
>
>
> On Fri, Jan 15, 2016 at 10:31 AM, Frederick Ayala <
> frederickayala@gmail.com> wrote:
>
>> Hi Robert,
>>
>> Thanks for your reply.
>>
>> I set the akka.ask.timeout to 10k seconds just to see what happened. I
>> tried different values but non did the trick.
>>
>> My problem was solved by using a machine with more RAM. However, it was
>> not clear that the memory was the problem :)
>>
>> Attached are the log and the Scala code of the transformation that I was
>> running.
>>
>> The data file I am processing is around 57M lines (~1.7GB).
>>
>> Let me know if you have any comment or suggestion.
>>
>> Thanks again,
>>
>> Frederick
>>
>>
>>
>> On Fri, Jan 15, 2016 at 10:01 AM, Robert Metzger <rmetzger@apache.org>
>> wrote:
>>
>>> Hi Frederick,
>>>
>>> sorry for the delayed response.
>>>
>>> I have no idea what the problem could be.
>>> Has the exception been thrown from the env.execute() call?
>>> Why did you set the akka.ask.timeout to 10k seconds?
>>>
>>>
>>>
>>>
>>> On Wed, Jan 13, 2016 at 2:13 AM, Frederick Ayala <
>>> frederickayala@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am having an error while running some Flink transformations in a
>>>> DataStream Scala API.
>>>>
>>>> The error I get is:
>>>>
>>>> Timeout while waiting for JobManager answer. Job time exceeded 21474835
>>>> seconds
>>>> ...
>>>>
>>>> Caused by: akka.pattern.AskTimeoutException: Ask timed out on
>>>> [Actor[akka://flink/user/$a#183984057]] after [21474835000 ms]
>>>>
>>>>
>>>> This happens after a couple of minutes. Not after 21474835 seconds...
>>>>
>>>> I tried different configurations but no result so far:
>>>>       val customConfiguration = new Configuration()
>>>>       customConfiguration.setInteger("parallelism", 8)
>>>>       customConfiguration.setInteger("jobmanager.heap.mb",2560)
>>>>       customConfiguration.setInteger("taskmanager.heap.mb",10240)
>>>>       customConfiguration.setInteger("taskmanager.numberOfTaskSlots",8)
>>>>
>>>> customConfiguration.setInteger("taskmanager.network.numberOfBuffers",16384)
>>>>       customConfiguration.setString("akka.ask.timeout","10000 s")
>>>>       customConfiguration.setString("akka.lookup.timeout","100 s")
>>>>       env =
>>>> ExecutionEnvironment.createLocalEnvironment(customConfiguration)
>>>>
>>>> Any idea what could it be the problem?
>>>>
>>>> Thanks!
>>>>
>>>> Frederick
>>>>
>>>
>>>
>>
>>
>> --
>> Frederick Ayala
>>
>
>

Mime
View raw message