flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: JM/TM startup time
Date Fri, 02 Oct 2015 15:44:21 GMT
Is that a new observation that it takes so long, or has it always taken so
long?

On Fri, Oct 2, 2015 at 5:40 PM, Robert Schmidtke <ro.schmidtke@gmail.com>
wrote:

> I figured the JM would be waiting for the TMs. Each of my nodes has 64G of
> memory available.
>
> On Fri, Oct 2, 2015 at 5:38 PM, Maximilian Michels <mxm@apache.org> wrote:
>
>> Hi Robert,
>>
>> During startup, the task manager allocates the entire managed memory.
>>
>> From the log:
>> 17:03:33,554 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>           - Using 0.7 of the currently free heap space for Flink
>> managed heap memory (34395 MB).
>>
>> It seems like you are allocating almost 35 GB of memory which might
>> take a bit (40 seconds still seems like too much time). What
>> configuration did you use for the task managers? Do you really have
>> that much memory or is your system swapping?
>>
>> I think the JobManager just appears to take a long time because the
>> TaskManagers register late.
>>
>> Cheers,
>> Max
>>
>> On Fri, Oct 2, 2015 at 5:26 PM, Robert Schmidtke <ro.schmidtke@gmail.com>
>> wrote:
>> > Hi everyone,
>> >
>> > I'm wondering about the startup times of the TMs:
>> >
>> > ...
>> > 17:03:33,255 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>> > - Starting TaskManager actor
>> > 17:03:33,262 INFO  org.apache.flink.runtime.io.network.netty.NettyConfig
>> > - NettyConfig [server address: cumu02-05/130.73.144.64, server port:
>> 45731,
>> > memory segment size (bytes): 32768, transport type: NIO, number of
>> server
>> > threads: 0 (use Netty's default), number of client threads: 0 (use
>> Netty's
>> > default), server connect backlog: 0 (use Netty's default), client
>> connect
>> > timeout (sec): 120, send/receive buffer size (bytes): 0 (use Netty's
>> > default)]
>> > 17:03:33,266 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>> > - Messages between TaskManager and JobManager have a max timeout of
>> 100000
>> > milliseconds
>> > 17:03:33,268 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>> > - Temporary file directory '/tmp': total 44 GB, usable 37 GB (84.09%
>> usable)
>> > 17:03:33,295 INFO
>> > org.apache.flink.runtime.io.network.buffer.NetworkBufferPool  -
>> Allocated 64
>> > MB for network buffer pool (number of memory segments: 2048, bytes per
>> > segment: 32768).
>> > 17:03:33,554 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>> > - Using 0.7 of the currently free heap space for Flink managed heap
>> memory
>> > (34395 MB).
>> >
>> > // almost 40 seconds //
>> >
>> > 17:04:12,445 INFO  org.apache.flink.runtime.io.disk.iomanager.IOManager
>> > - I/O manager uses directory
>> > /tmp/flink-io-922d9bf4-254e-41e7-b151-525157cd5bfe for spill files.
>> > 17:04:12,455 INFO  org.apache.flink.runtime.filecache.FileCache
>> > - User file cache uses directory
>> > /tmp/flink-dist-cache-792cf7f2-e2be-4950-a39f-d7a21326f054
>> > 17:04:12,617 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>> > - Starting TaskManager actor at
>> akka://flink/user/taskmanager#1341641688.
>> > 17:04:12,617 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>> > - TaskManager data connection information: cumu02-05.zib.de
>> (dataPort=45731)
>> > 17:04:12,618 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>> > - TaskManager has 16 task slot(s).
>> > 17:04:12,618 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>> > - Memory usage stats: [HEAP: 35502/49216/49216 MB, NON HEAP: 25/52/214
>> MB
>> > (used/committed/max)]
>> > 17:04:12,623 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>> > - Trying to register at JobManager
>> > akka.tcp://flink@130.73.144.59:6123/user/jobmanager (attempt 1,
>> timeout: 500
>> > milliseconds)
>> > 17:04:12,773 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>> > - Successful registration at JobManager
>> > (akka.tcp://flink@130.73.144.59:6123/user/jobmanager), starting network
>> > stack and library cache.
>> > ...
>> >
>> >
>> > The same goes for the JM (obviously).
>> >
>> > ...
>> > 17:03:31,632 INFO  org.apache.flink.runtime.jobmanager.JobManager
>> > - Starting JobManger web frontend
>> > 17:03:31,636 INFO  org.apache.flink.runtime.jobmanager.web.WebInfoServer
>> > - Setting up web info server, using web-root directory
>> >
>> jar:file:/nfs/csr/bzcschmi/flink/flink-dist/target/flink-0.10-SNAPSHOT-bin/flink-0.10-SNAPSHOT/lib/flink-dist-0.10-SNAPSHOT.jar!/web-docs-infoserver.
>> > 17:03:31,753 INFO  org.eclipse.jetty.util.log
>> > - jetty-0.10-SNAPSHOT
>> > 17:03:31,806 INFO  org.eclipse.jetty.util.log
>> > - Started SelectChannelConnector@0.0.0.0:8081
>> > 17:03:31,806 INFO  org.apache.flink.runtime.jobmanager.web.WebInfoServer
>> > - Started web info server for JobManager on 0.0.0.0:8081
>> >
>> > // almost 35 seconds //
>> >
>> > 17:04:05,091 INFO  org.apache.flink.runtime.instance.InstanceManager
>> > - Registered TaskManager at cumu02-02
>> > (akka.tcp://flink@130.73.144.61:53549/user/taskmanager) as
>> > e5ae92397a912c7360524524cf2d172a. Current number of registered hosts is
>> 1.
>> > Current number of alive task slots is 16.
>> > ...
>> >
>> >
>> > Is this to be expected? Any ideas what's happening in the meantime? I'm
>> > asking because I'm running into errors when submitting my job too early
>> (and
>> > not enough TMs have connected).
>> >
>> > Cheers
>> > Robert
>> >
>> > --
>> > My GPG Key ID: 336E2680
>>
>
>
>
> --
> My GPG Key ID: 336E2680
>

Mime
View raw message