Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A2D97200D20 for ; Tue, 3 Oct 2017 02:55:32 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A1041160BCB; Tue, 3 Oct 2017 00:55:32 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E887C1609EF for ; Tue, 3 Oct 2017 02:55:31 +0200 (CEST) Received: (qmail 47766 invoked by uid 500); 3 Oct 2017 00:55:30 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 47736 invoked by uid 99); 3 Oct 2017 00:55:30 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Oct 2017 00:55:30 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id A1B211A59FB for ; Tue, 3 Oct 2017 00:55:29 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.879 X-Spam-Level: ** X-Spam-Status: No, score=2.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id dr85zTWXYttJ for ; Tue, 3 Oct 2017 00:55:28 +0000 (UTC) Received: from mail-lf0-f46.google.com (mail-lf0-f46.google.com [209.85.215.46]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 2C58E5FB06 for ; Tue, 3 Oct 2017 00:55:28 +0000 (UTC) Received: by mail-lf0-f46.google.com with SMTP id l23so1034980lfk.10 for ; Mon, 02 Oct 2017 17:55:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=w5cBKwvDWuf3HnfbkT/CWub7GiSH2Z5Co1MaBm/pA2A=; b=dMdh8M9oR2wUDjnW9y1yLbNOsBiZKuH9Umj5nwkMxGvLKUF6ZqvBep2O9KReQLchj2 ZiGJt0j5Xk913mrQonNlg5w6YDVGcQ0lxIaTOASf5mClMyxFXXxSodzwv0l1XRepKQNe EGEWBZfrULc6rvqe4B+kbrNdfD2bJvIKTOyIjtAkoXivlTZ2X4WeHrrc2S3eq0Sy182V 9zu7QAmWrzIuFxeRedIYuPzjY0S4mFiPLeXPjHUpDkErVtt/xYOySYKbGu0rW34obbf9 IZmOO9cpBwwDBHhJpJ904Z4hWGZV4xpByx6GcGZ5sFd0E5pM7+UlGsJ6u1jBr0rKLLZ5 uPqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=w5cBKwvDWuf3HnfbkT/CWub7GiSH2Z5Co1MaBm/pA2A=; b=T7hZtjGjtTRnn67LLKrB1yrS9nKzbFj4QbadsngEXzLr6+KNbLRSMktFu9lti7QVDo RHo1SrE6RX40hMqSFIK+JwUiIgv1aPnS1RomLYQg5K+JiFvKWEwJniXaACWvKrYuODRW XHalaoDCPWlCSXueyGouBUg8PezoZPabIcGAYyE4b2Ghxwg0wtDG6yYmu9i+QV3qVJLI 7OOo0J/p5i3aFhQ0McOPSHybDRZzZ+nUgKk3/Ejrg95HVOq8OuMMjCTMvBvpiMJYPDw/ n3HxIqiCIHQIcOCABNz8+Cgzhuqfe1JV9fYtVZ2w78pxGC2mAsBfn5ZyJmbTv4iKYD3F hsVA== X-Gm-Message-State: AHPjjUhPFlGgokMZzZKrg5pXBBTUMPgrX8zAqv6qUCQEIA0O0UpLq4os rP0wuyUjWLXEFjjPll8UisXtprBrQfF7ji/ULqQ+HWZE X-Google-Smtp-Source: AOwi7QDv+Z3PaZMr8eqJhiu/TWqGBt6b3PxUR+5t6DSGpnnQ0gE0onj86HrvO+0irvDU3gXyyFE3AypKJzREJSJYVtE= X-Received: by 10.46.88.83 with SMTP id x19mr8079884ljd.80.1506992126239; Mon, 02 Oct 2017 17:55:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.46.21.83 with HTTP; Mon, 2 Oct 2017 17:55:25 -0700 (PDT) From: Marcos Juarez Date: Mon, 2 Oct 2017 18:55:25 -0600 Message-ID: Subject: Consumer Offsets partition skew on Kafka 0.10.1.1 To: dev@kafka.apache.org Content-Type: multipart/alternative; boundary="f403043885e85d2879055a99f2ae" archived-at: Tue, 03 Oct 2017 00:55:32 -0000 --f403043885e85d2879055a99f2ae Content-Type: text/plain; charset="UTF-8" I was investigating some performance issues we're issues in one of our production clusters, and I ran into extremely unbalanced offset partitions for the __consumer_offsets topic. I only pasted the top 8 below, out of 50 total. As you can see, between the top 5 partitions, those servers have to handle 83% of the commit volume, and brokers 9 and 10 show up repeatedly on as leader as well as replicas. Partition Offsets Percentage Leader Replicas ISR 6 52,761,610,477 34.24% 10 (10,6,7) (7,6,10) 5 46,196,021,230 29.98% 9 (9,5,6) (5,6,9) 42 17,530,298,423 11.38% 10 (10,9,11) (10,11,9) 31 12,927,081,106 8.39% 11 (11,9,10) (10,11,9) 0 8,557,903,671 5.55% 4 (4,12,1) (4,12,1) 2 3,969,232,652 2.58% 6 (6,2,3) (6,3,2) 49 3,555,754,347 2.31% 5 (5,11,7) (5,7,11) 33 2,273,951,745 1.48% 1 (1,11,12) (1,12,11) Those brokers (9, 10 and 11) also happen to be the ones we're having performance issues with. We can't be sure yet if this is the cause of the performance issues, but it's looking extremely likely. So, I was wondering, what can be done to "rebalance" these consumer offsets? This was something, as far as I know, automatically decided, I don't believe we ever changed a setting related to this. I also don't believe we can influence which partition gets which offsets when consuming. It would also be interesting to know what is the algorithm/pattern used to decide the consumer offset partition, and is this something we can change or influence? Thanks, Marcos Juarez --f403043885e85d2879055a99f2ae--