flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ovidiu-Cristian MARCU <ovidiu-cristian.ma...@inria.fr>
Subject Re: Monitoring REST API
Date Wed, 21 Dec 2016 23:03:28 GMT
Hi Lydia,

I have used sar monitoring (sar -u -n DEV -p -d -r 1) and plotted the average over multiple
nodes.

1)So for each node you can collect the sar output, and obtain for example:

Linux 3.2.0-4-amd64 (parasilo-4.rennes.grid5000.fr) 	2016-01-27 	_x86_64_	(16 CPU)
12:54:09        CPU     %user     %nice   %system   %iowait    %steal     %idle
12:54:10        all      4.63      0.00      3.25      0.13      0.00     91.99
12:54:09    kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive
  kbinact
12:54:10    129538812   2525308      1.91      1292     85876   3662636      2.69   2111652
    55132
12:54:09          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm
    %util
12:54:10          sda     28.71   2708.91     87.13     97.38      0.03      1.10      0.97
     2.77
12:54:09        IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
12:54:10         eth0    632.67    587.13   3173.60     58.47      0.00      0.00      0.00

2) Calculate the average over your nodes (sync clocks) and obtain a final output over which
you run some plot scripts:

LINE      DATE      FILENAME                 CPU_user  CPU_SYS   KBMEMFREE KBMEMUSED MEMUSED
  DISK_UTIL DISK_RKBs DISK_WKBs _IO_RSTs  _IO_WSTs
1         12:54:10  res1Avg                  6.12      1.25      129554704 2509412   1.90
     6.00      4253.63   87.04     3944.00   88.00     
2         12:54:11  res1Avg                  3.41      0.28      129523432 2540690   1.92
     4.00      2335.82   51.62     2692.00   0.00      
3         12:54:12  res1Avg                  0.06      0.03      129522000 2542120   1.92
     1.60      0.16      0.59      2048.00   32.00     
4         12:54:13  res1Avg                  0.09      0.06      129520936 2543182   1.92
     0.60      0.19      0.59      2048.00   0.00      
5         12:54:14  res1Avg                  0.06      0.06      129518448 2545670   1.93
     6.80      4.31      169.47    4044.00   16.00     

For other metrics specific to Flinkā€™s execution you may need to rely on various metrics
Flink is currently exposing.

Best,
Ovidiu

> On 21 Dec 2016, at 19:55, Lydia Ickler <icklerly@googlemail.com> wrote:
> 
> Hi all,
> 
> I have a question regarding the Monitoring REST API;
> 
> I want to analyze the behavior of my program with regards to I/O MiB/s, Network MiB/s
and CPU % as the authors of this paper did. (https://hal.inria.fr/hal-01347638v2/document
<https://hal.inria.fr/hal-01347638v2/document>)
> From the JSON file at http:master:8081/jobs/jobid/ I get a summary including the information
of read/write records and read/write bytes.
> Unfortunately the entries of Network or CPU are either (unknown) or 0.0. I am running
my program on a cluster with up to 32 nodes.
> 
> Where can I find the values for e.g. CPU or Network?
> 
> Thanks in advance!
> Lydia
> 


Mime
View raw message