From user-return-32745-archive-asf-public=cust-asf.ponee.io@flink.apache.org Thu Feb 20 09:56:15 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 6732918065D for ; Thu, 20 Feb 2020 10:56:14 +0100 (CET) Received: (qmail 89407 invoked by uid 500); 20 Feb 2020 09:56:12 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 89398 invoked by uid 99); 20 Feb 2020 09:56:12 -0000 Received: from Unknown (HELO mailrelay1-lw-us.apache.org) (10.10.3.159) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Feb 2020 09:56:12 +0000 Received: from mail-oi1-f177.google.com (mail-oi1-f177.google.com [209.85.167.177]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 4634B4FBC for ; Thu, 20 Feb 2020 09:56:12 +0000 (UTC) Received: by mail-oi1-f177.google.com with SMTP id a142so26935072oii.7 for ; Thu, 20 Feb 2020 01:56:11 -0800 (PST) X-Gm-Message-State: APjAAAUtsg0wqFzzn8NoRTxqadwmo3QIQ6eagPj27R9/eY0bJrL/GlFY q8ylLOG12ZY+UU6xDjGmPV+MIqXCpObIMPTYjOM= X-Google-Smtp-Source: APXvYqyrPfNPk89jBQtjiV+MHtGhGsL6z/UpJgDoXe9hemOy4OBwVT3wisFNnJe0BVSwbEi6by5s76uKRlnf2eL4pF4= X-Received: by 2002:aca:4996:: with SMTP id w144mr1296890oia.111.1582192571429; Thu, 20 Feb 2020 01:56:11 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Robert Metzger Date: Thu, 20 Feb 2020 10:55:54 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Identifying Flink Operators of the Latency Metric To: Morgan Geldenhuys Cc: user Content-Type: multipart/alternative; boundary="0000000000002fba04059efeea9a" --0000000000002fba04059efeea9a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hey Morgan, I would query the Monitoring REST API: https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.= html For example: GET http://localhost:8082/jobs/9a6748889bf24987495eead247aeb1ff Returns: 1. {jid: "9a6748889bf24987495eead247aeb1ff", name: "CarTopSpeedWindowingExample", isStoppable: false,=E2=80=A6} 1. jid: "9a6748889bf24987495eead247aeb1ff" 2. name: "CarTopSpeedWindowingExample" 3. isStoppable: false 4. state: "RUNNING" 5. start-time: 1582192403413 6. end-time: -1 7. duration: 18533 8. now: 1582192421946 9. timestamps: {FINISHED: 0, FAILING: 0, CANCELED: 0, SUSPENDED: 0, RUNNING: 1582192403550, RECONCILING: 0, FAILED: 0,=E2=80=A6} 10. vertices: [{id: "cbc357ccb763df2852fee8c4fc7d55f2", name: "Source: Custom Source -> Timestamps/Watermarks",=E2=80=A6},=E2=80=A6= ] 1. 0: {id: "cbc357ccb763df2852fee8c4fc7d55f2", name: "Source: Custom Source -> Timestamps/Watermarks",=E2=80=A6} 1. id: "cbc357ccb763df2852fee8c4fc7d55f2" 2. name: "Source: Custom Source -> Timestamps/Watermarks" 3. parallelism: 1 4. status: "RUNNING" 5. start-time: 1582192403754 6. end-time: -1 7. duration: 18192 8. tasks: {CREATED: 0, CANCELED: 0, RECONCILING: 0, FAILED: 0, CANCELING: 0, DEPLOYING: 0, RUNNING: 1,=E2=80=A6} 9. metrics: {read-bytes: 0, read-bytes-complete: true, write-bytes: 0, write-bytes-complete: true, read-records: 0,=E2= =80=A6} 2. 1: {id: "90bea66de1c231edf33913ecd54406c1",=E2=80=A6} 1. id: "90bea66de1c231edf33913ecd54406c1" 2. name: "Window(GlobalWindows(), DeltaTrigger, TimeEvictor, ComparableAggregator, PassThroughWindowFunction) -> Sink: Print to Std. Out " 3. parallelism: 1 4. status: "RUNNING" 5. start-time: 1582192403759 6. end-time: -1 7. duration: 18187 8. tasks: {CREATED: 0, CANCELED: 0, RECONCILING: 0, FAILED: 0, CANCELING: 0, DEPLOYING: 0, RUNNING: 1,=E2=80=A6} 9. metrics: {read-bytes: 4669, read-bytes-complete: true, write-bytes: 0, write-bytes-complete: true,=E2=80=A6} 11. status-counts: {CREATED: 0, CANCELED: 0, RECONCILING: 0, FAILED: 0, CANCELING: 0, DEPLOYING: 0, RUNNING: 2,=E2=80=A6} 12. plan: {jid: "9a6748889bf24987495eead247aeb1ff", name: "CarTopSpeedWindowingExample",=E2=80=A6} On Tue, Feb 18, 2020 at 5:01 PM Morgan Geldenhuys < morgan.geldenhuys@tu-berlin.de> wrote: > Hi All, > > I have setup monitoring for Flink (1.9.2) via Prometheus and am intereste= d > in viewing the end-to-end latency at the sink operators for the 95 > percentile. I have enabled latency markers at the operator level and can > see the results, one of the entries looks as follows: > > > flink_taskmanager_job_latency_source_id_operator_id_operator_subtask_inde= x_latency{app=3D"flink",component=3D"taskmanager",host=3D"flink_taskmanager= _6bdc8fc49_kr4bs",instance=3D" > 10.244.18.2:9999 > ",job=3D"kubernetes-pods",job_id=3D"96d32d8e380dc267bd69403fd7e20adf",job= _name=3D"Traffic",kubernetes_namespace=3D"default",kubernetes_pod_name=3D"f= link-taskmanager-6bdc8fc49-kr4bs",operator_id=3D"2e32dc82f03b1df764824a4773= 219c97",operator_subtask_index=3D"7",pod_template_hash=3D"6bdc8fc49",quanti= le=3D"0.95",source_id=3D"cbc357ccb763df2852fee8c4fc7d55f2",tm_id=3D"7fb02c0= ed734ed1815fa51373457434f"} > > That is great, however... I am unable to determine which of the operators > is the sink operator I'm looking for based solely on the operator_id. Is > there a way of determining this? > > Regards, > M. > --0000000000002fba04059efeea9a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hey Morgan,


For exa= mple:
GET http://localhost:8082/jobs/9a6748889bf= 24987495eead247aeb1ff
Returns:
  1. {ji= d: "9a6748889bf24987495eead247aeb1ff", name: "CarTopSpeedWin= dowingExample", isStoppable: false,=E2=80=A6}
    1. jid:=C2= =A0"9a6748889bf24987495eead247aeb1ff"
    2. name:=C2=A0"CarTopSpeedWindowingExample"
    3. isStoppable:=C2=A0false
    4. state:=C2=A0"RUNNING"
    5. start-time:=C2=A01582192= 403413
    6. = end-time:=C2=A0-1
    7. du= ration:=C2=A018533
    8. = now:=C2=A01582192421946
    9. timestamps:=C2=A0{FINISHED: 0, FAILING: 0, CANCELED: 0, SUSPENDED: 0, RU= NNING: 1582192403550, RECONCILING: 0, FAILED: 0,=E2=80=A6}
    10. vertices:=C2=A0[{id: "cbc357ccb763df2852fee8c4fc7d55f2&= quot;, name: "Source: Custom Source -> Timestamps/Watermarks",= =E2=80=A6},=E2=80=A6]
      1. 0:=C2=A0{id: "cbc357ccb763= df2852fee8c4fc7d55f2", name: "Source: Custom Source -> Timesta= mps/Watermarks",=E2=80=A6}
        1. id:=C2=A0"cbc357ccb763df2852fee8c4fc7d55f2"
        2. name:=C2=A0"Source: Custom Source -> Timestamps/Wat= ermarks"<= /span>
        3. parallelism:=C2=A01
        4. status<= /span>:=C2=A0"RUNNING"
        5. start-time<= /span>:=C2=A015= 82192403754
        6. end-time:=C2=A0-1
        7. duration:=C2=A018192
        8. tasks<= /span>:=C2=A0{CREATED: 0, CANCELED:= 0, RECONCILING: 0, FAILED: 0, CANCELING: 0, DEPLOYING: 0, RUNNING: 1,=E2= =80=A6}
        9. metrics:= =C2=A0{read-bytes: 0, read-bytes-co= mplete: true, write-bytes: 0, write-bytes-complete: true, read-records: 0,= =E2=80=A6}
      2. 1:=C2=A0{id: "90bea6= 6de1c231edf33913ecd54406c1",=E2=80=A6}
        1. id:=C2= =A0"90bea66de1c231edf33913ecd54406c1"
        2. name:=C2=A0"Window(GlobalWindows(), DeltaT= rigger, TimeEvictor, ComparableAggregator, PassThroughWindowFunction) ->= Sink: Print to Std. Out"
        3. parallelism:=C2=A01
        4. =
        5. status:=C2=A0"RUNNING"
        6. start-time:=C2=A01582192403759
        7. end-time:= =C2=A0-1=
        8. duration:=C2=A018187
        9. tasks:=C2=A0{CR= EATED: 0, CANCELED: 0, RECONCILING: 0, FAILED: 0, CANCELING: 0, DEPLOYING: = 0, RUNNING: 1,=E2=80=A6}
        10. metrics:=C2=A0{read-bytes= : 4669, read-bytes-complete: true, write-bytes: 0, write-bytes-complete: tr= ue,=E2=80=A6}
    11. status-counts:=C2=A0{CREA= TED: 0, CANCELED: 0, RECONCILING: 0, FAILED: 0, CANCELING: 0, DEPLOYING: 0,= RUNNING: 2,=E2=80=A6}
    12. plan:=C2=A0{jid: "9a6748889bf24987495eead247aeb1ff&= quot;, name: "CarTopSpeedWindowingExample",=E2=80=A6}

On Tue, Feb 18, 2020 at 5:01 PM Morgan Geldenhuys &l= t;morgan.geldenhuys@tu-be= rlin.de> wrote:
=20 =20 =20
Hi All,

I have setup monitoring for Flink (1.9.2) via Prometheus and am interested in viewing the end-to-end latency at the sink operators for the 95 percentile. I have enabled latency markers at the operator level and can see the results, one of the entries looks as follows:

flink_taskmanager_job_latency_source_id_operator_id_operator_subt= ask_index_latency{app=3D"flink",component=3D"taskmanager&quo= t;,host=3D"flink_taskmanager_6bdc8fc49_kr4bs",instance=3D"10.244.18.2:9999&qu= ot;,job=3D"kubernetes-pods",job_id=3D"96d32d8e380dc267bd6940= 3fd7e20adf",job_name=3D"Traffic",kubernetes_namespace=3D&quo= t;default",kubernetes_pod_name=3D"flink-taskmanager-6bdc8fc49-kr4= bs",operator_id=3D"2e32dc82f03b1df764824a4773219c97",operato= r_subtask_index=3D"7",pod_template_hash=3D"6bdc8fc49",q= uantile=3D"0.95",source_id=3D"cbc357ccb763df2852fee8c4fc7d55= f2",tm_id=3D"7fb02c0ed734ed1815fa51373457434f"}

That is great, however... I am unable to determine which of the operators is the sink operator I'm looking for based solely on the operator_id. Is there a way of determining this?

Regards,
M.
--0000000000002fba04059efeea9a--