From commits-return-51330-archive-asf-public=cust-asf.ponee.io@pulsar.apache.org Fri Mar 27 14:56:07 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 270D4180637 for ; Fri, 27 Mar 2020 15:56:07 +0100 (CET) Received: (qmail 97522 invoked by uid 500); 27 Mar 2020 14:56:06 -0000 Mailing-List: contact commits-help@pulsar.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pulsar.apache.org Delivered-To: mailing list commits@pulsar.apache.org Received: (qmail 97513 invoked by uid 99); 27 Mar 2020 14:56:06 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Mar 2020 14:56:06 +0000 From: GitBox To: commits@pulsar.apache.org Subject: [GitHub] [pulsar] fracasula opened a new issue #6627: Ghost consumer Message-ID: Date: Fri, 27 Mar 2020 14:56:06 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit fracasula opened a new issue #6627: Ghost consumer URL: https://github.com/apache/pulsar/issues/6627 **Describe the bug** We noticed that some messages were not getting processed by our consumer with a **key shared** subscription. We only had one consumer at that time so we checked the subscriptions via the `pulsar-admin topics stats` command and noticed that it was showing two consumers instead. The application is deployed on a Kubernetes cluster so we scaled all the pods down to make sure that there could be no consumers at all but in the `stats` we could still see the ghost consumer. Before scaling all the pods down: ``` "cloud-spaceroom-service" : { "msgRateOut" : 0.0, "msgThroughputOut" : 0.0, "msgRateRedeliver" : 0.0, "msgBacklog" : 21, "blockedSubscriptionOnUnackedMsgs" : false, "msgDelayed" : 0, "unackedMessages" : 21, "type" : "Key_Shared", "msgRateExpired" : 0.0, "lastExpireTimestamp" : 0, "consumers" : [ { "msgRateOut" : 0.0, "msgThroughputOut" : 0.0, "msgRateRedeliver" : 0.0, "consumerName" : "a0869e4030", "availablePermits" : 0, "unackedMessages" : 0, "blockedConsumerOnUnackedMsgs" : false, "metadata" : { }, "address" : "/10.1.1.76:33118", "connectedSince" : "2020-03-24T09:47:06.109Z" }, { "msgRateOut" : 0.0, "msgThroughputOut" : 0.0, "msgRateRedeliver" : 0.0, "consumerName" : "0c7db0c8c0", "availablePermits" : 1000, "unackedMessages" : 0, "blockedConsumerOnUnackedMsgs" : false, "metadata" : { }, "address" : "/10.1.1.76:45426", "connectedSince" : "2020-03-24T13:53:45.761Z" } ], "isReplicated" : false }, ``` After scaling all the pods down: ``` "cloud-spaceroom-service" : { "msgRateOut" : 0.0, "msgThroughputOut" : 0.0, "msgRateRedeliver" : 0.0, "msgBacklog" : 21, "blockedSubscriptionOnUnackedMsgs" : false, "msgDelayed" : 0, "unackedMessages" : 21, "type" : "Key_Shared", "msgRateExpired" : 0.0, "lastExpireTimestamp" : 0, "consumers" : [ { "msgRateOut" : 0.0, "msgThroughputOut" : 0.0, "msgRateRedeliver" : 0.0, "consumerName" : "a0869e4030", "availablePermits" : 0, "unackedMessages" : 0, "blockedConsumerOnUnackedMsgs" : false, "metadata" : { }, "address" : "/10.1.1.76:33118", "connectedSince" : "2020-03-24T09:47:06.109Z" } ], "isReplicated" : false }, ``` In order to get rid of the ghost consumer we had to kill the pod with the Pulsar broker. Once Kubernetes restarted the broker the `stats` command finally showed the connected consumers only. **To Reproduce** I haven't been able to replicate the issue yet. I do have all the logs centralized and accessible though. It would be very helpful if you could help us understand what went wrong. **Expected behavior** I would have expected the consumer to disappear after having scaled down all the consumer pods, which is what happened for the connected consumer but not for the ghost one (i.e. `a0869e4030` - see `stats` output above). Unfortunately the bad thing is that messages were still being routed to the ghost consumer. **Additional context** Known information about the ghost consumer: * Consumer name: a0869e4030 * Subscriber name: cloud-spaceroom-service * Subscription type: Key shared * Connected since: 09:47:06.109Z * Address: /10.1.1.76:33118 * Topic: /public/default/UserJoinedSpace * Persistent topic: true Additional information: * Client was using the official Golang lib with the underlying C++ library v2.5.0 * There was only one consumer with that subscriber name at that time * The subscription for that topic was reported by the broker at `09:46:58.581831923Z` * Prior to establishing the connection the client reported several times (for a period of ~6s) that it was unable to reconnect its consumer and it rescheduled lots of reconnections (from `09:46:57.788` to `09:47:03.631`) * Around the same time (~`09:47:03.558313436Z`) the broker reported that it was creating 2 subscriptions for that topic and that it already had a consumer with id 18 present on the connection * `Consumer with id 18 is already present on the connection` * The client reported a few times (~`09:47:03.616168539Z`) that it couldn't reconnect the consumer due to an unknown error * All logs within the Pulsar namespace go silent about the `UserJoinedSpace` topic at `09:47:06.367671431Z` and start again at `09:51:21.172230502Z` which makes for a 254805ms gap (~4.25 minutes) More related logs: * `Removed consumer Consumer{subscription=PersistentSubscription{topic=persistent://public/default/UserJoinedSpace, name=nodes-service}, consumerId=0, consumerName=71ea78f19e, address=/10.1.1.76:33122} with pending 0 acks` * `[/10.1.1.76:33118] Cleared consumer created after timeout on client side Consumer{subscription=PersistentSubscription{topic=persistent://public/default/UserJoinedSpace, name=cloud-spaceroom-service}, consumerId=18, consumerName=a0869e4030, address=/10.1.1.76:33118}` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org With regards, Apache Git Services