Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AA546200C7E for ; Tue, 23 May 2017 16:20:58 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A8F7B160BD6; Tue, 23 May 2017 14:20:58 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EF5AF160BB6 for ; Tue, 23 May 2017 16:20:57 +0200 (CEST) Received: (qmail 76219 invoked by uid 500); 23 May 2017 14:20:52 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 76206 invoked by uid 99); 23 May 2017 14:20:52 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 May 2017 14:20:52 +0000 Received: from mail-pf0-f182.google.com (mail-pf0-f182.google.com [209.85.192.182]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id C3BC61A031B for ; Tue, 23 May 2017 14:20:51 +0000 (UTC) Received: by mail-pf0-f182.google.com with SMTP id n23so116144817pfb.2 for ; Tue, 23 May 2017 07:20:51 -0700 (PDT) X-Gm-Message-State: AODbwcDOgIYan3jP5FnrzSEFOpza10qv9br4PthuktaWHtynzzU+lP9/ uWV3mDCF9Frbevs6YU8EOGKdhMfbqg== X-Received: by 10.84.229.79 with SMTP id d15mr36508474pln.93.1495549251478; Tue, 23 May 2017 07:20:51 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.179.110 with HTTP; Tue, 23 May 2017 07:20:11 -0700 (PDT) In-Reply-To: <94A3D65C-8BBE-4C72-9901-F37596FF8754@icloud.com> References: <94A3D65C-8BBE-4C72-9901-F37596FF8754@icloud.com> From: Till Rohrmann Date: Tue, 23 May 2017 16:20:11 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Need help debugging back pressure job To: Fritz Budiyanto Cc: user Content-Type: multipart/alternative; boundary="94eb2c19ecb4e0f018055031b141" archived-at: Tue, 23 May 2017 14:20:58 -0000 --94eb2c19ecb4e0f018055031b141 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Fritz, you're right that back pressure should propagate upstream to the sources. Thus, the cause of the back pressure should be the operator following the last operator with back pressure. In order to debug it you could take a look at the stack trace of the TM. Simply go to the machine on which the TM runs, find out the process id via jps and then call jstack with the respective process id. Alternatively, you can try to debug the cluster remotely [1]. [1] https://cwiki.apache.org/confluence/display/FLINK/Remote+Debugging+of+Flink= +Clusters Cheers, Till On Tue, May 23, 2017 at 7:14 AM, Fritz Budiyanto wrote: > Hi All, > > Any tips on debugging back pressure ? I have a workload where it get stuc= k > after it ran for a couple of hours. > I assume the cause of the back pressure is the block next to the one > showing as having the back pressure, is this right ? > > Any idea on how to get the backtrace ? (I=E2=80=99m using standalone comb= ined > jm/tm with parallelism of 1, and the suspected block is doing > ProcessFunction with event timers) > > =E2=80=94 > Fritz > > > --94eb2c19ecb4e0f018055031b141 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Fritz,

you're right that back pr= essure should propagate upstream to the sources. Thus, the cause of the bac= k pressure should be the operator following the last operator with back pre= ssure.

In order to debug it you could take a look = at the stack trace of the TM. Simply go to the machine on which the TM runs= , find out the process id via jps and then call jstack with the respective = process id.

Alternatively, you can try to debug th= e cluster remotely [1].


Cheers,
Till

On Tue, May 23= , 2017 at 7:14 AM, Fritz Budiyanto <fbudiyan@icloud.com> w= rote:
Hi All,

Any tips on debugging back pressure ? I have a workload where it get stuck = after it ran for a couple of hours.
I assume the cause of the back pressure is the block next to the one showin= g as having the back pressure, is this right ?

Any idea on how to get the backtrace ? (I=E2=80=99m using standalone combin= ed jm/tm with parallelism of 1, and the suspected block is doing ProcessFun= ction with event timers)

=E2=80=94
Fritz



--94eb2c19ecb4e0f018055031b141--