flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Nowojski <pi...@ververica.com>
Subject Re: Collecting operators real output cardinalities as json files
Date Mon, 25 May 2020 17:54:32 GMT
Hi Francesco,

Have you taken a look at the metrics? [1] And IO metrics [2] in particular? You can use some
of the pre-existing metric reporter [3] or implement a custom one. You could export metrics
to some 3rd party system, and get JSONs from there, or export them to JSON directly via a
custom metric reporter.

Piotrek

[1] https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html <https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html>
[2] https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#io <https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#io>
[3] https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#reporter
<https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#reporter>

> On 23 May 2020, at 11:31, Francesco Ventura <francesco.ventura@campus.tu-berlin.de>
wrote:
> 
> Hi everybody, 
> 
> I would like to collect the statistics and the real output cardinalities about the execution
of many jobs as json files. I know that exist a REST interface that can be used but I was
looking for something simpler. In practice, I would like to get the information showed in
the WebUI at runtime about a job and store it as a file. I am using the env.getExecutionPlan()
to get the execution plan of a job with the estimated cardinalities for each operator. However,
it includes only estimated cardinalities and it can be used only before calling env.execute().

> 
> There is a similar way to extract the real output cardinalities of each pipeline after
the execution? 
> Is there a place where the Flink cluster stores the history of the information about
executed jobs?
> Developing a REST client to extract such information is the only way possible? 
> 
> I also would like to avoid adding counters to the job source code since I am monitoring
the run time execution and I should avoid everything that can interfere.
> 
> Maybe is a trivial problem but I have a quick look around and I can not find the solution.
> 
> Thank you very much,
> 
> Francesco


Mime
View raw message