Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B4441200D4C for ; Thu, 30 Nov 2017 15:45:09 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id B2B97160BF4; Thu, 30 Nov 2017 14:45:09 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AD23F160BEA for ; Thu, 30 Nov 2017 15:45:08 +0100 (CET) Received: (qmail 89957 invoked by uid 500); 30 Nov 2017 14:45:07 -0000 Mailing-List: contact jira-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@kafka.apache.org Delivered-To: mailing list jira@kafka.apache.org Received: (qmail 89945 invoked by uid 99); 30 Nov 2017 14:45:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Nov 2017 14:45:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 6312CC5414 for ; Thu, 30 Nov 2017 14:45:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.011 X-Spam-Level: X-Spam-Status: No, score=-99.011 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KB_WAM_FROM_NAME_SINGLEWORD=0.2, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id DBk4AVDl7pDP for ; Thu, 30 Nov 2017 14:45:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 76F3D5F6CD for ; Thu, 30 Nov 2017 14:45:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 9E7FEE0F13 for ; Thu, 30 Nov 2017 14:45:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 5D30E2105E for ; Thu, 30 Nov 2017 14:45:00 +0000 (UTC) Date: Thu, 30 Nov 2017 14:45:00 +0000 (UTC) From: "Seweryn Habdank-Wojewodzki (JIRA)" To: jira@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (KAFKA-6260) AbstractCoordinator not clearly handles NULL Exception MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 30 Nov 2017 14:45:09 -0000 [ https://issues.apache.org/jira/browse/KAFKA-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16272741#comment-16272741 ] Seweryn Habdank-Wojewodzki edited comment on KAFKA-6260 at 11/30/17 2:44 PM: ----------------------------------------------------------------------------- [~hachikuji] * Can you tell me exactly (exact setting name), which settings are logically combined, so I can set them respectively to our timings including our wait for end-clients results, please? * Yes I can test any hot fixes, the only matter is how can I get them into our build process :-). was (Author: habdank): [~hachikuji] * Can you tell me exactly which settings are logically combined, so I can set them respectively to our timings including our wait for end-clients results, please? * Yes I can test any hot fixes, the only matter is how can I get them into our build process :-). > AbstractCoordinator not clearly handles NULL Exception > ------------------------------------------------------ > > Key: KAFKA-6260 > URL: https://issues.apache.org/jira/browse/KAFKA-6260 > Project: Kafka > Issue Type: Bug > Affects Versions: 1.0.0 > Environment: RedHat Linux > Reporter: Seweryn Habdank-Wojewodzki > Assignee: Jason Gustafson > Fix For: 1.1.0, 1.0.1 > > > The error reporting is not clear. But it seems that Kafka Heartbeat shuts down application due to NULL exception caused by "fake" disconnections. > One more comment. We are processing messages in the stream, but sometimes we have to block processing for minutes, as consumers are not handling too much load. Is it possibble that when stream is waiting, then heartbeat is as well blocked? > Can you check that? > {code} > 2017-11-23 23:54:47 DEBUG AbstractCoordinator:177 - [Consumer clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, groupId=kafka-endpoint] Received successful Heartbeat response > 2017-11-23 23:54:50 DEBUG AbstractCoordinator:183 - [Consumer clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, groupId=kafka-endpoint] Sending Heartbeat request to coordinator cljp01.eb.lan.at:9093 (id: 2147483646 rack: null) > 2017-11-23 23:54:50 TRACE NetworkClient:135 - [Consumer clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, groupId=kafka-endpoint] Sending HEARTBEAT {group_id=kafka-endpoint,generation_id=3834,member_id=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer-94f18be5-e49a-4817-9e5a-fe82a64e0b08} with correlation id 24 to node 2147483646 > 2017-11-23 23:54:50 TRACE NetworkClient:135 - [Consumer clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, groupId=kafka-endpoint] Completed receive from node 2147483646 for HEARTBEAT with correlation id 24, received {throttle_time_ms=0,error_code=0} > 2017-11-23 23:54:50 DEBUG AbstractCoordinator:177 - [Consumer clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, groupId=kafka-endpoint] Received successful Heartbeat response > 2017-11-23 23:54:52 DEBUG NetworkClient:183 - [Consumer clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, groupId=kafka-endpoint] Disconnecting from node 1 due to request timeout. > 2017-11-23 23:54:52 TRACE NetworkClient:135 - [Consumer clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, groupId=kafka-endpoint] Cancelled request {replica_id=-1,max_wait_time=6000,min_bytes=1,max_bytes=52428800,isolation_level=0,topics=[{topic=clj_internal_topic,partitions=[{partition=6,fetch_offset=211558395,log_start_offset=-1,max_bytes=1048576},{partition=8,fetch_offset=210178209,log_start_offset=-1,max_bytes=1048576},{partition=0,fetch_offset=209353523,log_start_offset=-1,max_bytes=1048576},{partition=2,fetch_offset=209291462,log_start_offset=-1,max_bytes=1048576},{partition=4,fetch_offset=210728595,log_start_offset=-1,max_bytes=1048576}]}]} with correlation id 21 due to node 1 being disconnected > 2017-11-23 23:54:52 DEBUG ConsumerNetworkClient:195 - [Consumer clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, groupId=kafka-endpoint] Cancelled FETCH request RequestHeader(apiKey=FETCH, apiVersion=6, clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, correlationId=21) with correlation id 21 due to node 1 being disconnected > 2017-11-23 23:54:52 DEBUG Fetcher:195 - [Consumer clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, groupId=kafka-endpoint] Fetch request {clj_internal_topic-6=(offset=211558395, logStartOffset=-1, maxBytes=1048576), clj_internal_topic-8=(offset=210178209, logStartOffset=-1, maxBytes=1048576), clj_internal_topic-0=(offset=209353523, logStartOffset=-1, maxBytes=1048576), clj_internal_topic-2=(offset=209291462, logStartOffset=-1, maxBytes=1048576), clj_internal_topic-4=(offset=210728595, logStartOffset=-1, maxBytes=1048576)} to cljp01.eb.lan.at:9093 (id: 1 rack: DC-1) failed org.apache.kafka.common.errors.DisconnectException: null > 2017-11-23 23:54:52 TRACE NetworkClient:123 - [Consumer clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, groupId=kafka-endpoint] Found least loaded node cljp01.eb.lan.at:9093 (id: 1 rack: DC-1) > 2017-11-23 23:54:52 DEBUG NetworkClient:183 - [Consumer clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, groupId=kafka-endpoint] Initialize connection to node cljp01.eb.lan.at:9093 (id: 1 rack: DC-1) for sending metadata request > 2017-11-23 23:54:52 DEBUG NetworkClient:183 - [Consumer clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, groupId=kafka-endpoint] Initiating connection to node cljp01.eb.lan.at:9093 (id: 1 rack: DC-1) > 2017-11-23 23:54:52 ERROR AbstractCoordinator:296 - [Consumer clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, groupId=kafka-endpoint] Heartbeat thread for group kafka-endpoint failed due to unexpected error > java.lang.NullPointerException: null > at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:436) ~[my-kafka-endpoint.jar:?] > at org.apache.kafka.common.network.Selector.poll(Selector.java:395) ~[my-kafka-endpoint.jar:?] > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460) ~[my-kafka-endpoint.jar:?] > at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:238) ~[my-kafka-endpoint.jar:?] > at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.pollNoWakeup(ConsumerNetworkClient.java:275) ~[my-kafka-endpoint.jar:?] > at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:934) [my-kafka-endpoint.jar:?] > 2017-11-23 23:54:52 DEBUG AbstractCoordinator:177 - [Consumer clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, groupId=kafka-endpoint] Heartbeat thread has closed > 2017-11-23 23:55:16 INFO KafkaElasticsearchEndpoint:61 - Got shutdown requests ... > 2017-11-23 23:55:16 INFO StreamProcessor:106 - Streams closing ... > 2017-11-23 23:55:16 INFO StreamProcessor:111 - with 5 [s] timeout > 2017-11-23 23:55:16 DEBUG KafkaStreams:183 - stream-client [kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31]Stopping Streams client with timeoutMillis = 5000 ms. > 2017-11-23 23:55:16 INFO KafkaStreams:346 - stream-client [kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31]State transition from RUNNING to PENDING_SHUTDOWN > 2017-11-23 23:55:16 INFO StreamThread:336 - stream-thread [kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1] Informed to shut down > 2017-11-23 23:55:16 INFO StreamThread:346 - stream-thread [kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1] State transition from RUNNING to PENDING_SHUTDOWN > {code} > Kafka settings: > {code} > auto.commit.interval.ms = 5000 > auto.offset.reset = earliest > bootstrap.servers = [xxx:9093, yyy:9093] > check.crcs = true > client.id = kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer > connections.max.idle.ms = 540000 > enable.auto.commit = false > exclude.internal.topics = true > fetch.max.bytes = 52428800 > fetch.max.wait.ms = 6000 > fetch.min.bytes = 1 > group.id = kafka-endpoint > heartbeat.interval.ms = 3000 > interceptor.classes = null > internal.leave.group.on.close = false > isolation.level = read_uncommitted > key.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer > max.partition.fetch.bytes = 1048576 > max.poll.interval.ms = 10000000 > max.poll.records = 2000 > metadata.max.age.ms = 300000 > metric.reporters = [] > metrics.num.samples = 2 > metrics.recording.level = INFO > metrics.sample.window.ms = 30000 > partition.assignment.strategy = [org.apache.kafka.streams.processor.internals.StreamPartitionAssignor] > receive.buffer.bytes = 65536 > reconnect.backoff.max.ms = 1000 > reconnect.backoff.ms = 50 > request.timeout.ms = 7000 > retry.backoff.ms = 100 > sasl.jaas.config = null > sasl.kerberos.kinit.cmd = /usr/bin/kinit > sasl.kerberos.min.time.before.relogin = 60000 > sasl.kerberos.service.name = null > sasl.kerberos.ticket.renew.jitter = 0.05 > sasl.kerberos.ticket.renew.window.factor = 0.8 > sasl.mechanism = GSSAPI > security.protocol = SSL > send.buffer.bytes = 131072 > session.timeout.ms = 6000 > ssl.cipher.suites = null > ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1] > ssl.endpoint.identification.algorithm = null > ssl.key.password = [hidden] > ssl.keymanager.algorithm = SunX509 > ssl.keystore.location = /data/my/etc/kafka/client-keystore > ssl.keystore.password = [hidden] > ssl.keystore.type = JKS > ssl.protocol = TLS > ssl.provider = null > ssl.secure.random.implementation = SHA1PRNG > ssl.trustmanager.algorithm = PKIX > ssl.truststore.location = /data/my/etc/kafka/truststore > ssl.truststore.password = [hidden] > ssl.truststore.type = JKS > value.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)