flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yun Gao" <yungao...@aliyun.com>
Subject Re: After a flink streaming job has been running for a while (about one day), it will automatically restart.
Date Wed, 10 Jul 2019 02:27:02 GMT
Hi,

     For the exception of `Connection reset by peer`, it means the connection fails due to
received TCP package with RESET flag. There might be two cases:
     1. A TaskManager connected to the one throws this exception has shutdown due to some
other exceptions.
     2. The underlying physical network is suffering from package loss. When a single package
gets lost multiple times, then a RESET package will be sent by the sender side. It might happen
in cases like CPU usage is too high to handle network card interrupts or the underlying physical
hardware has problems.

    Therefore, I think you might first check whether TM connecting to this one (it should
be reported with the exception and I think you might find it in the original log file) has
shutdown when the exception is thrown. If not, then it might need to check if there are package
loss when the exception happens. 

Best,
Yun



------------------------------------------------------------------
From:mailtolrl <mailtolrl@126.com>
Send Time:2019 Jul. 10 (Wed.) 09:40
To:user <user@flink.apache.org>
Subject:After a flink streaming job has been running for a while (about one day), it will
automatically restart.

Hi all,
 I started a flink streaming job and it will always restart automatically after running for
a while (about 1 day).

The start job command´╝Üflink run -yd -m yarn-cluster -yqu myqueue  -yn 1 -yjm 1024 -ytm 2048
-ys 1 -p 30 myjar.jar someArgs

The restart config is:


And the running error message is :
1.
2.

3.


Each job is always automatically restarted because of the above error.

Thanks.







Mime
View raw message