Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6CE6B200C60 for ; Mon, 24 Apr 2017 19:24:27 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 6B6D0160B99; Mon, 24 Apr 2017 17:24:27 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8C0F5160B93 for ; Mon, 24 Apr 2017 19:24:26 +0200 (CEST) Received: (qmail 92050 invoked by uid 500); 24 Apr 2017 17:24:25 -0000 Mailing-List: contact dev-help@samza.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@samza.apache.org Delivered-To: mailing list dev@samza.apache.org Received: (qmail 92032 invoked by uid 99); 24 Apr 2017 17:24:24 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Apr 2017 17:24:24 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id B0F7DC028B for ; Mon, 24 Apr 2017 17:24:23 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.63 X-Spam-Level: **** X-Spam-Status: No, score=4.63 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, KAM_BADIPHTTP=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id wm4QpIpHkP1m for ; Mon, 24 Apr 2017 17:24:19 +0000 (UTC) Received: from mail-it0-f48.google.com (mail-it0-f48.google.com [209.85.214.48]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 52A955FB7A for ; Mon, 24 Apr 2017 17:24:19 +0000 (UTC) Received: by mail-it0-f48.google.com with SMTP id e132so61412855ite.1 for ; Mon, 24 Apr 2017 10:24:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=1uh0sPyHWId+CZ6mqZWLg6ov94ESZHSt5mtgMGe6JTY=; b=kn7UqnbImJsfgXA+E7EeDlAsU0eBPU8BO3O3Sxy9VPVPvVpq/lKWMzf8RySzHQkXym RATt+i1NPqTyNiYmcvSWa7sLUluuZb14xAJINiSVsHZ4rI5GHiqKmlblt5dBftzZtXNa HHG0Woa8N9PiMQzngK3zFxUPYypeUI//IArqCYwkph1cXahSj9J1WfmL9c5iGb/T8CBc Lr7vN2VsTQy1gD+ijvnDv2S8YKdNURf9EbvN04S/A2RW8PEmYJnfNnor2wm/GlrHFsE0 HY5dHygGDLDnWjHDuEOmuyBvumu/VYg8EBxOSeleYsYy5fSGN86fkTURi97tfjOOtIJ+ exBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=1uh0sPyHWId+CZ6mqZWLg6ov94ESZHSt5mtgMGe6JTY=; b=GEvRtYnG8oqg/VOGuGFin9pOMAJLOvZzKrM5lMvLXcuKLoDqGtm/ZIbUiQYG7FxJqn RpoYtJwHs0HzwYt80YYlprnU/CgD6qJpMeTnJuT7Z4Lj3ye9XwWf6AnXvnKGaaxGA0T7 hdBm1QExDgEVbuEzAY6/95RR1mLX24s14q8lcWCG3qYaNJHcvQsa3vVzN2fsSDhAUszO xstQ0GmJeEvLUF83kLJ5q5jeNAIyo0QbU3favkkI2iOZmRZ2GMb9r/MxhLlzVstyTvrL 4fLqiVwg/vUSJlG5cahqWpRZcjYA7SyMUvLGvf/6PTzPj1X8HubGtDnj8bOo9gVZelky YJ7g== X-Gm-Message-State: AN3rC/61V8dpNJsG1Y/+bPX74KnWIOHIpokAYhjDnrorwF81WVLFy5SC MpbrfZv7pUBlXpHcGLQu/UQw3DHVRZfp X-Received: by 10.36.185.5 with SMTP id w5mr14360626ite.1.1493054658356; Mon, 24 Apr 2017 10:24:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.36.115.82 with HTTP; Mon, 24 Apr 2017 10:24:17 -0700 (PDT) In-Reply-To: <913C7935-C448-4F37-8087-0928C1AA5D1C@eefung.com> References: <913C7935-C448-4F37-8087-0928C1AA5D1C@eefung.com> From: Jagadish Venkatraman Date: Mon, 24 Apr 2017 10:24:17 -0700 Message-ID: Subject: Re: About reconnect times? To: dev@samza.apache.org Content-Type: multipart/alternative; boundary=f403045d990e8ab82d054dece0cb archived-at: Mon, 24 Apr 2017 17:24:27 -0000 --f403045d990e8ab82d054dece0cb Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi ShuQi, My apologies for the late reply. There are 2 categories of exceptions here (both occurring presumably due to your Kafka broker failure) *Producer side:* - This is a network exception from the *Sender* instance inside the *KafkaProducer* used by Samza. - The default number of retries in Samza is MAX_INT. You can configure retries by over-riding: systems.system-name.producer.retries More generally, any Kafka property can be over-ridden as follows: systems.system-name. producer.* Any Kafka producer configuration can be included here. For example, to change the request timeout, you can set systems.system-name.producer.timeout.ms. (There is no need to configure client.id as it is automatically configured by Samza.) *Consumer-side:* - The exception is a timeout triggered from the DefaultFetchSimpleConsumer. It happens in a separate thread where we poll for messages, and hence, should not affect the *SamzaContainer* main thread. - The default behavior is to attempt a re-connect, and then re-create the Consumer instance. The number of reconnect attempts is unbounded (and not configurable). Best, Jagadish On Tue, Apr 18, 2017 at 10:51 PM, =E8=88=92=E7=90=A6 wro= te: > Hi Guys, > > One of brokers in Kafka cluster is going down, the samza got the > following exception: > > 2017-04-19 10:42:36.751 [SAMZA-BROKER-PROXY-BrokerProxy thread pointed at > 172.19.105.20:9096 for client samza_consumer-canal_status_content_distinc= t-1] > DefaultFetchSimpleConsumer [INFO] Reconnect due to error: > java.net.SocketTimeoutException > at sun.nio.ch.SocketAdaptor$SocketInputStream.read( > SocketAdaptor.java:211) > at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103= ) > at java.nio.channels.Channels$ReadableByteChannelImpl.read( > Channels.java:385) > at org.apache.kafka.common.network.NetworkReceive. > readFromReadableChannel(NetworkReceive.java:81) > at kafka.network.BlockingChannel.readCompletely( > BlockingChannel.scala:129) > at kafka.network.BlockingChannel.receive(BlockingChannel.scala: > 120) > at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer. > scala:86) > at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$ > $sendRequest(SimpleConsumer.scala:83) > at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$ > apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:132) > at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$ > apply$mcV$sp$1.apply(SimpleConsumer.scala:132) > at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$ > apply$mcV$sp$1.apply(SimpleConsumer.scala:132) > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) > at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp( > SimpleConsumer.scala:131) > at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply( > SimpleConsumer.scala:131) > at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply( > SimpleConsumer.scala:131) > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) > at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:130) > at org.apache.samza.system.kafka.DefaultFetchSimpleConsumer.fetch= ( > DefaultFetchSimpleConsumer.scala:48) > at org.apache.samza.system.kafka.DefaultFetchSimpleConsumer. > defaultFetch(DefaultFetchSimpleConsumer.scala:41) > at org.apache.samza.system.kafka.BrokerProxy.org$apache$samza$ > system$kafka$BrokerProxy$$fetchMessages(BrokerProxy.scala:179) > at org.apache.samza.system.kafka.BrokerProxy$$anon$1$$anonfun$ > run$1.apply(BrokerProxy.scala:147) > at org.apache.samza.system.kafka.BrokerProxy$$anon$1$$anonfun$ > run$1.apply(BrokerProxy.scala:134) > at org.apache.samza.util.ExponentialSleepStrategy.run( > ExponentialSleepStrategy.scala:82) > at org.apache.samza.system.kafka.BrokerProxy$$anon$1.run( > BrokerProxy.scala:133) > at java.lang.Thread.run(Thread.java:745) > 2017-04-19 10:42:44.507 [kafka-producer-network-thread | > samza_producer-canal_status_content_distinct-1] Sender [WARN] Got error > produce response with correlation id 64783117 on topic-partition > tweets_distinctContent-5, retrying (2147483646 attempts left). Error: > NETWORK_EXCEPTION > > Does =E2=80=9C2147483646 attempts left=E2=80=9D mean that samza w= ill try to > reconnect to broken broker 2147483646 times? > And the log shows that samza keeps connecting to the broken broke= r > and the samza cluster can=E2=80=99t read any new messages even if Kafka c= luster is > fault tolerance. > > How can I override this property: =E2=80=9C2147483646 attempts le= ft=E2=80=9D? > > Thanks. > > =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94 > ShuQi > > --=20 Jagadish V, Graduate Student, Department of Computer Science, Stanford University --f403045d990e8ab82d054dece0cb--