flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: Fat jar fails deployment (streaming job too large)
Date Tue, 27 Feb 2018 13:27:07 GMT
Hi Niels,

the size of the jar does not play a role for Flink. What could be a problem
is that the serialized `JobGraph` (user code with closures) is larger than
10 MB and, thus, exceeds the maximum default framesize of Akka. In such a
case, it cannot be sent to the `JobMaster`. You can control the framesize
via `akka.framesize`.

In order to debug the problem properly, I would need access to the client
log and the JobManager logs if possible.

Cheers,
Till

On Tue, Feb 27, 2018 at 11:05 AM, Fabian Hueske <fhueske@gmail.com> wrote:

> Hi Niels,
>
> There should be no size constraints on the complexity of an application or
> the size of a JAR file.
> The problem that you describe sounds a bit strange and should be fixed.
>
> Apparently, it has to spend more time on planning / submitting the
> application than before.
> Have you tried to increase the akka.client.timeout parameter?
>
> If that does not help, it would be good to learn what the JobManager is
> doing after the application was submitted.
> Either it just takes longer than before such that the client timeout is
> exceeded or it might even get stuck in some kind of deadlock (which would
> be bad).
> In that case it might help to take a few stacktraces of JM process after
> the application was submitted to check if the threads are making progress.
>
> I'll also include Till who is more familiar with the submission process
> and JM planning and coordination.
>
> Best, Fabian
>
>
> 2018-02-27 9:31 GMT+01:00 Niels <nielsdenissen@gmail.com>:
>
>> Hi All,
>>
>> We've been using Flink 1.3.2 for a while now, but recently failed to
>> deploy
>> our fat jar to the cluster. The deployment only works when we remove 2
>> arbitrary operators, thus giving us the impression our job is too large.
>> However, we only changed some case classes and serializers (to support
>> Avro)
>> compared to a working version of our jar. I'll provide some context below.
>>
>> *Streaming operators used: *(same list as when deploy worked)
>> - 9 Incoming streams from Kafka (all parsed from JSON -> Case Classes)
>> - 6 Stateful Joins (extend CoProcessFunction)
>> - 4 Stateful Processors (extend ProcessFunction)
>> - 5 Maps
>> - 2 Filters
>> - ‎1 Union of 3 Streams
>> - 1 Sink to Kafka (Case class -> JSON)
>>
>> *Changes made:*
>> - add extended Type Serializer for Avro support
>> - add companion objects to case classes for translation to Avro Generic
>> Records
>> - alter state full functions to use above changes
>>
>> *what does work:*
>> - remove 2 arbitrary operators and deploy fat jar
>> - ‎run full program using sbt run locally
>>
>> Could it be that somehow the complexity causes the job deploy as jar to
>> fail? We simply get a timeout from Flinks CLI when trying to deploy, even
>> when extending the timeout to several minutes.
>>
>> Any help would be very much appreciated!
>>
>> Thanks,
>> Niels
>>
>>
>>
>> --
>> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.
>> nabble.com/
>>
>
>

Mime
View raw message