flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chesnay Schepler <ches...@apache.org>
Subject Re: Streaming job monitoring
Date Thu, 08 Jun 2017 10:16:01 GMT
Hello Flavio,

I'm not sure what source you are using, but it looks like the 
ContinouosFileMonitoringSource which works with 2 operators.
The first operator (what is displayed as the actual Source) emits input 
splits (chunks of files that should be read) and passes
these to the second operator (split reader).

So the numRecordsOut of the source is the number of splits created.

For sinks, the numRecordsOut counter is essentially unused; mostly 
because it is kind of redundant as it should in general
be equal to the numRecordsIn counter. The same applies to the 
numRecordsIn counter of sources.
(although in this particular case it would be interesting to know how 
many files the source has read...)

This is something we would have to solve for each source/sink 
individually, which is kind of tricky as the numRecordsIn/-Out
metrics are internal metrics and not accessible in user-defined 
functions without casting.

In your case the reading of the chunks and writing by the sink is done 
in a single task. The webUI is not aware of operators
and thus can't display the individual metrics nicely.

The metrics tab doesn't aggregate metrics across subtasks, so I can see 
how that would be cumbersome to use. We can't solve
aggregation in general as when dealing with Gauges we just don't know 
whether we can aggregate them at all.
Frankly, this is also something I don't really won't to implement in the 
first place as there are dedicated systems for this
exact use-case. The WebFrontend is not meant as a replacement for these 

In general i would recommend to setup a dedicated metrics system like 
graphite/ganglia to store metrics and use grafana
or something similar to actually monitor them.


On 08.06.2017 11:43, Flavio Pompermaier wrote:
> Hi to all,
> we've successfully ran our first straming job on a Flink cluster (with 
> some problems with the shading of guava..) and it really outperforms 
> Logstash, from the point of view of indexing speed and easiness of use.
> However there's only one problem: when the job is running, in the Job 
> Monitoring UI, I see 2 blocks within the plan visualizer:
>  1. Source: Custom File Source (without any info about the file I'm
>     reading)
>  2. Split Reader: Custom File source -> Sink: unnamed
> None of them helps me to understand which data I'm reading or writing 
> (while within the batch jobs this is usually displayed). Moreover, in 
> the task details the "Byte sent/Records sent" are totally senseless, I 
> don't know what is counted (see the attached image if available)...I 
> see documents indexed on ES but in the Flink Job UI I don't see 
> anything that could help to understand how many documents are sent to 
> ES or from one function (Source) to the other (Sink).
> I tried to display some metrics and there I found something but I hope 
> this is not the usual way of monitoring streaming jobs...am I doing 
> something wrong? Or the streaming jobs should be monitored with 
> something else?
> Inline image 1
> Best,
> Flavio

View raw message