Hey Morgan,

I would query the Monitoring REST API: https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html

For example:
GET http://localhost:8082/jobs/9a6748889bf24987495eead247aeb1ff
Returns:
  1. {jid: "9a6748889bf24987495eead247aeb1ff", name: "CarTopSpeedWindowingExample", isStoppable: false,…}
    1. jid"9a6748889bf24987495eead247aeb1ff"
    2. name"CarTopSpeedWindowingExample"
    3. isStoppablefalse
    4. state"RUNNING"
    5. start-time1582192403413
    6. end-time-1
    7. duration18533
    8. now1582192421946
    9. timestamps{FINISHED: 0, FAILING: 0, CANCELED: 0, SUSPENDED: 0, RUNNING: 1582192403550, RECONCILING: 0, FAILED: 0,…}
    10. vertices[{id: "cbc357ccb763df2852fee8c4fc7d55f2", name: "Source: Custom Source -> Timestamps/Watermarks",…},…]
      1. 0{id: "cbc357ccb763df2852fee8c4fc7d55f2", name: "Source: Custom Source -> Timestamps/Watermarks",…}
        1. id"cbc357ccb763df2852fee8c4fc7d55f2"
        2. name"Source: Custom Source -> Timestamps/Watermarks"
        3. parallelism1
        4. status"RUNNING"
        5. start-time1582192403754
        6. end-time-1
        7. duration18192
        8. tasks{CREATED: 0, CANCELED: 0, RECONCILING: 0, FAILED: 0, CANCELING: 0, DEPLOYING: 0, RUNNING: 1,…}
        9. metrics{read-bytes: 0, read-bytes-complete: true, write-bytes: 0, write-bytes-complete: true, read-records: 0,…}
      2. 1{id: "90bea66de1c231edf33913ecd54406c1",…}
        1. id"90bea66de1c231edf33913ecd54406c1"
        2. name"Window(GlobalWindows(), DeltaTrigger, TimeEvictor, ComparableAggregator, PassThroughWindowFunction) -> Sink: Print to Std. Out"
        3. parallelism1
        4. status"RUNNING"
        5. start-time1582192403759
        6. end-time-1
        7. duration18187
        8. tasks{CREATED: 0, CANCELED: 0, RECONCILING: 0, FAILED: 0, CANCELING: 0, DEPLOYING: 0, RUNNING: 1,…}
        9. metrics{read-bytes: 4669, read-bytes-complete: true, write-bytes: 0, write-bytes-complete: true,…}
    11. status-counts{CREATED: 0, CANCELED: 0, RECONCILING: 0, FAILED: 0, CANCELING: 0, DEPLOYING: 0, RUNNING: 2,…}
    12. plan{jid: "9a6748889bf24987495eead247aeb1ff", name: "CarTopSpeedWindowingExample",…}

On Tue, Feb 18, 2020 at 5:01 PM Morgan Geldenhuys <morgan.geldenhuys@tu-berlin.de> wrote:
Hi All,

I have setup monitoring for Flink (1.9.2) via Prometheus and am interested in viewing the end-to-end latency at the sink operators for the 95 percentile. I have enabled latency markers at the operator level and can see the results, one of the entries looks as follows:

flink_taskmanager_job_latency_source_id_operator_id_operator_subtask_index_latency{app="flink",component="taskmanager",host="flink_taskmanager_6bdc8fc49_kr4bs",instance="10.244.18.2:9999",job="kubernetes-pods",job_id="96d32d8e380dc267bd69403fd7e20adf",job_name="Traffic",kubernetes_namespace="default",kubernetes_pod_name="flink-taskmanager-6bdc8fc49-kr4bs",operator_id="2e32dc82f03b1df764824a4773219c97",operator_subtask_index="7",pod_template_hash="6bdc8fc49",quantile="0.95",source_id="cbc357ccb763df2852fee8c4fc7d55f2",tm_id="7fb02c0ed734ed1815fa51373457434f"}

That is great, however... I am unable to determine which of the operators is the sink operator I'm looking for based solely on the operator_id. Is there a way of determining this?

Regards,
M.