flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nico Kruber <n...@data-artisans.com>
Subject Re: Correlation between data streams/operators and threads
Date Fri, 17 Nov 2017 08:22:04 GMT
regarding 3.
a) The taskmanager logs are missing, are there any?
b) Also, the JobManager logs say you have 4 slots available in total - is this 
enough for your 5 devices scenario?
c) The JobManager log, however, does not really reveal what it is currently 
doing, can you set the log level to DEBUG to see more?
d) Also, do you still observe CPU load during the 15min as an indication that 
it is actually doing something?
e) During this 15min period where apparently nothing happens, can you provide 
the output of "jstack <jobmanager_pid>" (with the PID of your JobManager)?
f) You may further be able to debug into what is happening by running this in 
your IDE in debug mode and pause the execution when you suspect it to hang.


Nico

On Tuesday, 14 November 2017 14:27:36 CET Piotr Nowojski wrote:
> 3. Nico, can you take a look at this one? Isn’t this a blob server issue?
> 
> Piotrek
> 
> > On 14 Nov 2017, at 11:35, Shailesh Jain <shailesh.jain@stellapps.com>
> > wrote:
> > 
> > 3. Have attached the logs and exception raised (15min - configured akka
> > timeout) after submitting the job.
> > 
> > Thanks,
> > Shailesh
> > 
> > 
> > On Tue, Nov 14, 2017 at 2:46 PM, Piotr Nowojski <piotr@data-artisans.com
> > <mailto:piotr@data-artisans.com>> wrote: Hi,
> > 
> > 3. Can you show the logs from job manager and task manager?
> > 
> >> On 14 Nov 2017, at 07:26, Shailesh Jain <shailesh.jain@stellapps.com
> >> <mailto:shailesh.jain@stellapps.com>> wrote:
> >> 
> >> Hi Piotrek,
> >> 
> >> I tried out option 'a' mentioned above, but instead of separate jobs, I'm
> >> creating separate streams per device. Following is the test deployment
> >> configuration as a local cluster (8GB ram, 2.5 GHz i5, ubuntu machine):
> >> 
> >> akka.client.timeout 15 min
> >> jobmanager.heap.mb 1024
> >> jobmanager.rpc.address localhost
> >> jobmanager.rpc.port 6123
> >> jobmanager.web.port 8081
> >> metrics.reporter.jmx.class org.apache.flink.metrics.jmx.JMXReporter
> >> metrics.reporter.jmx.port 8789
> >> metrics.reporters jmx
> >> parallelism.default 1
> >> taskmanager.heap.mb 1024
> >> taskmanager.memory.preallocate false
> >> taskmanager.numberOfTaskSlots 4
> >> 
> >> The number of Operators per device stream is 4 (one sink function, 3 CEP
> >> operators).
> >> 
> >> Observations (and questions):
> >> 
> >> 3. Job deployment hangs (never switches to RUNNING) when the number of
> >> devices is greater than 5. Even on increasing the akka client timeout,
> >> it does not help. Will separate jobs being deployed per device instead
> >> of separate streams help here?
> >> 
> >> Thanks,
> >> Shailesh
Mime
View raw message