flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Nowojski <pi...@data-artisans.com>
Subject Re: What's the meaning of "Registered `TaskManager` at akka://flink/deadLetters " ?
Date Mon, 15 Jan 2018 12:06:43 GMT
Hi,

Could you post full job manager and task manager logs from startup until the first signs of
the problem?

Thanks, Piotrek

> On 15 Jan 2018, at 11:21, Reza Samee <reza.samee@gmail.com> wrote:
> 
> Thanks for response; 
> And sorry the passed time.
> 
> The JobManager & TaskManager logged ports are open!
> 
> 
> Is this log OK?
> 2018-01-15 13:40:03,455 INFO  org.apache.flink.runtime.webmonitor.JobManagerRetriever
      - New leader reachable under akka.tcp://flink@172.16.20.18:6123/user/jobmanager:null
<http://flink@172.16.20.18:6123/user/jobmanager:null>.
> 
> When I kill task-manger, the jobmanager logs:
> 2018-01-15 13:32:41,419 WARN  akka.remote.ReliableDeliverySupervisor                
       - Association with remote system [akka.tcp://flink@stage_dbq_1:45532] has failed, address
is now gated for [5000] ms. Reason: [Disassociated] 
> 
> But it will not decrement the number of available task-managers!
> and when I start my signle task-manager again, it logs:
> 
> 2018-01-15 13:32:52,753 INFO  org.apache.flink.runtime.instance.InstanceManager     
       - Registered TaskManager at ??? (akka://flink/deadLetters) as 626846ae27a833cb094eeeb047a6a72c.
Current number of registered hosts is 2. Current number of alive task slots is 40.
> 
> 
> On Wed, Jan 10, 2018 at 11:36 AM, Piotr Nowojski <piotr@data-artisans.com <mailto:piotr@data-artisans.com>>
wrote:
> Hi,
> 
> Search both job manager and task manager logs for ip address(es) and port(s) that have
timeouted. First of all make sure that nodes are visible to each other using some simple ping.
Afterwards please check that those timeouted ports are opened and not blocked by some firewall
(telnet).
> 
> You can search the documentation for the configuration parameters with “port” in
name:
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html <https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html>
> But note that many of them are random by default.
> 
> Piotrek
> 
>> On 9 Jan 2018, at 17:56, Reza Samee <reza.samee@gmail.com <mailto:reza.samee@gmail.com>>
wrote:
>> 
>> 
>> I'm running a flink-cluster (a mini one with just one node); but the problem is that
my TaskManager can't reach to my JobManager!
>> 
>> Here are logs from TaskManager
>> ...
>> Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager <>
(attempt 20, timeout: 30 seconds)
>> Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager <>
(attempt 21, timeout: 30 seconds)
>> Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager <>
(attempt 22, timeout: 30 seconds)
>> Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager <>
(attempt 23, timeout: 30 seconds)
>> Trying to register at JobManager akka.tcp://flink@MY_PRIV_IP/user/jobmanager <>
(attempt 24, timeout: 30 seconds)
>> ...
>> 
>> My "JobManager UI" shows my TaskManager with this Path & ID: "akka://flink/deadLetters
<>" ( in TaskManagers tab)
>> And I found these lines in my JobManger stdout:
>> 
>> Resource Manager associating with leading JobManager Actor[akka://flink/user/jobmanager#-275619168
<>] - leader session null
>> TaskManager ResourceID{resourceId='1132cbdaf2d8204e5e42e321e8592754'} has started.
>> Registered TaskManager at MY_PRIV_IP (akka://flink/deadLetters <>) as 7d9568445b4557a74d05a0771a08ad9c.
Current number of registered hosts is 1. Current number of alive task slots is 20.
>> 
>> 
>> What's the meaning of these lines? Where should I look for the solution?
>> 
>> 
>> 
>> 
>> -- 
>> رضا سامعی / http://samee.blog.ir <http://samee.blog.ir/>
> 
> 
> 
> -- 
> رضا سامعی / http://samee.blog.ir <http://samee.blog.ir/>

Mime
View raw message