Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0B0232004F3 for ; Tue, 15 Aug 2017 14:49:56 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 097BF16693E; Tue, 15 Aug 2017 12:49:56 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4FF67166940 for ; Tue, 15 Aug 2017 14:49:55 +0200 (CEST) Received: (qmail 9919 invoked by uid 500); 15 Aug 2017 12:49:54 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 9782 invoked by uid 99); 15 Aug 2017 12:49:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Aug 2017 12:49:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 6E3EA1A026C for ; Tue, 15 Aug 2017 12:49:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Lhw06J5SDhbn for ; Tue, 15 Aug 2017 12:49:51 +0000 (UTC) Received: from mail-qt0-f174.google.com (mail-qt0-f174.google.com [209.85.216.174]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id C169E5FC72 for ; Tue, 15 Aug 2017 12:49:50 +0000 (UTC) Received: by mail-qt0-f174.google.com with SMTP id s6so3685470qtc.1 for ; Tue, 15 Aug 2017 05:49:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=oJuN89tsfYZeq598dUXpOcrmgdbc7Z6sgpAidFzGu7k=; b=TT8sgRQBaofFv9oOVJRwuUzX32ByBgiJaKsfKoizCJSke4JGxDJiR8hPckMVKyUk0C cOyRuNDPRGr/4/7leHWHbrz1iCvHQPhFDGUD5nxg0d1ZHFV+7cHZJEdNClUx2x0+/Lna 58rtmzfNtm7lpCPqQbdLTxstysmv87aCfLc832GoMGWVOMh32qMwSSZ8NAEejUjq3HVD uqcTmjFLR5USFw9Qc/MMIe9cC7FdzSBRFHdq5jD/yzEbo5I4y7irk0n84M5O9ZEPL0Nd p8ShkLPuxG4BSrgDaGhLkDdRJZueNeapqNvPQeKhuwYFnvI8Sjs5bDkL7v0sYPFg4y7H ikdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=oJuN89tsfYZeq598dUXpOcrmgdbc7Z6sgpAidFzGu7k=; b=Cv2/6uyeOIJDXhG5bfPqFc2xzF9waOHnWdXETTACXQaMdiEWNYtHhPdzT59J04y+XC usnfcTMjkvrCGq/4tMjUFVqmei+Ex8Ez3wT/2E39F2nB+JqwJRDsyTO5EOxH0X+E9Y3t 3tBEIigKKwtVcKc8bXx2Q37TTwRqboSzgp02gO8WNIXzTROu1No6UxJIZYC2CnaVvnl0 aDOElxp345VgfGCUtPg2YsZyYXxrXIQQVNPm3L/0A2AvlZ9Fnr4x63rzh5eVoWJzQxxs tkMydiQM2ubcE38mp6fKNnXrmwur2Xju/PL0u9Lu/cZkPvCKxhSmaijS7OnDgNyVvRgn 50KA== X-Gm-Message-State: AHYfb5hBgCgWA+R75zw/+xyDih5HBHdU00+rFC7i/7iElmV7uwAKK+4g xmhySOTCbpQ+Izb1zYtf2fMQAAIdEdaKatE= X-Received: by 10.200.3.80 with SMTP id w16mr38445134qtg.325.1502801390171; Tue, 15 Aug 2017 05:49:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.81.199 with HTTP; Tue, 15 Aug 2017 05:49:49 -0700 (PDT) From: =?UTF-8?B?VGFtw6FzIE3DoXTDqQ==?= Date: Tue, 15 Aug 2017 14:49:49 +0200 Message-ID: Subject: [KAFKA-5138] MirrorMaker doesn't exit on send failure occasionally To: dev@kafka.apache.org Content-Type: multipart/alternative; boundary="f4030435b7480778fd0556ca3755" archived-at: Tue, 15 Aug 2017 12:49:56 -0000 --f4030435b7480778fd0556ca3755 Content-Type: text/plain; charset="UTF-8" Hi Guys, I have just started to work on this ticket a little more than a week ago: https://issues.apache.org/jira/browse/KAFKA-5138 I could not reproduce it sadly, but from the logs Dustin gave and from the code it seems like this might not be just a MirrorMaker issue but a consumer one. My theory is 1) MM send failure happens because of heavy load 2) MM starts to close its producer 3) during MM shutdown and the source server starts a consumer rebalance (the consumers couldn't respond because of the heavy load) 4) heartbeat response gets delayed 5) MM producer closed, but MM gets a heartbeat response and resets the connection 6) because there is thread left in the JVM it can't shut down 7) MM hangs Maybe the order is a bit different, I couldn't prove it without reproduction. I played with the following configs under 100ms and then stress tested the source cluster with JMeter. - request.timeout.ms - replica.lag.time.max.ms - session.timeout.ms - group.min.session.timeout.ms - group.max.session.timeout.ms - heartbeat.interval.ms Could you give me some pointers how could I reproduce this issue? Thanks, Tamas --f4030435b7480778fd0556ca3755--