flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chesnay Schepler <ches...@apache.org>
Subject Re: Flink Graphire Reporter stops reporting via TCP if network issue
Date Fri, 05 May 2017 12:20:26 GMT
Hello,

for Graphite, Flink uses the DropWizard metrics reporter. I don't know 
at the moment whether it supports any kind of reconnecting functionality.

I'm not sure whether i understood you correctly; did you try upgrading 
the DropWizard metrics-core/metrics-graphite dependencies?

If that didn't do the trick we could in fact implement this in Flink, it 
would be hack though. When an error occurs we can simply re-instantiate 
the reporter, but we would have to know how the reporter communicates 
the connection drop; i.e. whether it throws some exception or not.

Could you check the log for a warning statements from the MetricRegistry?

Regards,
Chesnay

On 05.05.2017 13:26, Bruno Aranda wrote:
> Hi,
>
> We are using the Graphite reporter from Flink 1.2.0 to send the 
> metrics via TCP. Due to our network configuration we cannot use UDP at 
> the moment.
>
> We have observed that if there is any problem with graphite our the 
> network, basically, the TCP connection times out or something, the 
> metrics reporter does not recover. This is easy to reproduce by 
> blocking the port we are sending the metrics using iptables. If we 
> block the port for more than a minute or so, the problem will happen. 
> After the port is re-open, Flink does not continue like before.
>
> Is this a known issue? Googling shows some problems with the 
> metrics-graphite package that should have been solved already. We have 
> trying updated metrics-core/graphite to the latest with no success.
>
> Any ideas?
>
> Thanks!
>
> Bruno



Mime
View raw message