Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 51804186A7 for ; Sat, 6 Feb 2016 12:29:02 +0000 (UTC) Received: (qmail 92100 invoked by uid 500); 6 Feb 2016 12:28:57 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 92017 invoked by uid 500); 6 Feb 2016 12:28:57 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 92008 invoked by uid 99); 6 Feb 2016 12:28:56 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Feb 2016 12:28:56 +0000 Received: from mail-lb0-f177.google.com (mail-lb0-f177.google.com [209.85.217.177]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 385631A0015 for ; Sat, 6 Feb 2016 12:28:56 +0000 (UTC) Received: by mail-lb0-f177.google.com with SMTP id cw1so62641872lbb.1 for ; Sat, 06 Feb 2016 04:28:56 -0800 (PST) X-Gm-Message-State: AG10YOQHvyhovV7pxpkmNWEhxyy23iCS+HKXnx+luEllryzuKUT/ievYWX5zeIYg4a0gFwzeVlJMWjXsYZrF/w== X-Received: by 10.112.14.39 with SMTP id m7mr7937534lbc.20.1454761734887; Sat, 06 Feb 2016 04:28:54 -0800 (PST) MIME-Version: 1.0 Received: by 10.112.190.67 with HTTP; Sat, 6 Feb 2016 04:28:35 -0800 (PST) In-Reply-To: References: <0DE8D4E5-C7DF-4E65-9FCC-5F61C948CC20@apache.org> From: Robert Metzger Date: Sat, 6 Feb 2016 13:28:35 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Performance insights To: "user@flink.apache.org" Content-Type: multipart/alternative; boundary=001a11c3726a70e408052b191c29 --001a11c3726a70e408052b191c29 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable You can count the number of elements per key. This allows you to see how they are distributed. On Sat, Feb 6, 2016 at 1:23 PM, Flavio Pompermaier wrote: > And what if I detect some skewness in some task? Do I have to try to call > rebalance()?is there a way to identify the keys causing the skewness? > On 5 Feb 2016 21:33, "Ufuk Celebi" wrote: > >> >> > On 05 Feb 2016, at 16:38, Flavio Pompermaier >> wrote: >> > >> > Is there an easy way to understand if and when my data get skewed in >> the pipeline? >> >> Yes, the web frontend shows how many bytes and records the sub tasks sen= d >> and receive respectively. Skew would show as some tasks having higher >> numbers than the others. >> >> =E2=80=93 Ufuk >> >> --001a11c3726a70e408052b191c29 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
You can count the number of elements per key. This allows = you to see how they are distributed.

On Sat, Feb 6, 2016 at 1:23 PM, Flavio Pompermaier= <pompermaier@okkam.it> wrote:

And what if I detect some skewness in some task? Do = I have to try to call rebalance()?is there a way to identify the keys causi= ng the skewness?

On 5 Feb 2016 21:33, "Ufuk Celebi" <= ;uce@apache.org>= wrote:

> On 05 Feb 2016, at 16:38, Flavio Pompermaier <pompermaier@okkam.it> wrote: >
> Is there an easy way to understand if and when my data get skewed in t= he pipeline?

Yes, the web frontend shows how many bytes and records the sub tasks send a= nd receive respectively. Skew would show as some tasks having higher number= s than the others.

=E2=80=93 Ufuk


--001a11c3726a70e408052b191c29--