flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chesnay Schepler <ches...@apache.org>
Subject Re: Tame Flink UI?
Date Wed, 23 Nov 2016 15:53:09 GMT
Hello,

So there are 2 separate issues here:

 1. The response when requesting the list of available metrics is pretty
    big.
 2. The request for the values of these metrics is also pretty big, and
    the response even larger.

For now I will modify the WebUI to only ask the value of selected 
metrics, which will typically be a rather small number. This
should solve issue #2. (Provided no one has the funny idea of selecting 
all metrics!)

To fix #1 we will have to go with a more compact representation i guess; 
this however will require a bit more work,+ since the
backend has to detect the valid ranges (i.e the subtasks for which we do 
in fact have the metric).

Note that #1 should not be such a big problem since the list of metrics 
is not update regularly as far as i know.
If one doesn't interact with the metrics tab no request (or maybe one at 
startup) should be sent.

Regards,
Chesnay

On 16.11.2016 20:38, Cliff Resnick wrote:
> Ufuk,
>
> The above occurs for me simply by selecting a running job from the job 
> list.
>
> Chesnay,
>
> The 413 error is because of the large request size. Given all the 
> repeated parameter names maybe a more compact representation would 
> work? For example, instead of enumerating all metrics, maybe ask for 
> the range?
>
> On Wed, Nov 16, 2016 at 2:05 PM, Chesnay Schepler <chesnay@apache.org 
> <mailto:chesnay@apache.org>> wrote:
>
>     Hello,
>
>     The WebInterfaces first pulls a list of all available metrics for
>     a specific taskmanager/job/task (which is reasonable since how
>     else would you select them),
>     and then requests the values for all metrics by supplying the name
>     of every single metric it just received, which is where things get
>     funky.
>
>     In this case we have a task with parallelism of around 90 (the
>     number before the metric name is the subtask index).
>     Now let's only consider IO metrics (numRecordsIn etc.).
>     We then have 90 * 6 (task IO metrics) + 90 * 4 (operator IO
>     metrics) * X (# of operators in the task) metrics.
>     In the best of case of a single operator this results in 900
>     metrics being pulled at once,
>     which is done every few seconds; i don't know the exact update
>     interval.
>
>     We can disable this temporarily in a few ways; the easiest one
>     being to simply never return any metrics in the initial metrics
>     look-up.
>     See AbstractMetricsHandler#getAvailableMetricsList
>
>     Regards,
>     Chesnay
>
>     On 16.11.2016 19:15, Ufuk Celebi wrote:
>
>         Hey Cliff,
>
>         yes this has been recently merged to the master branch. I
>         think you are right that this is not feasible. I thought that
>         the metrics are pulled in selectively when you select them via
>         the metrics list. It seems to be not the case.
>
>         If it is really the case that everything is always requested
>         then we would have to revert this for the time being. Did you
>         select any metrics manually?
>
>         – Ufuk
>
>         On 16 November 2016 at 18:40:52, Cliff Resnick
>         (cresny@gmail.com <mailto:cresny@gmail.com>) wrote:
>
>             We're on 1.2-SNAPSHOT, and some time over the past couple
>             of weeks the UI
>             seems to have become much more aggressive polling for
>             metrics. I'm seeing
>             hundreds of 413 errors as the UI continuously tries to GET
>             with a URL over
>             100k, pretty much overwhelming the SOCKS proxy connection.
>               Below is an example of a javascript GET that seems to be
>             emitted every few
>             seconds. Is this intentional? <removed by Chesnay>
>



Mime
View raw message