Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 92C41200D37 for ; Thu, 26 Oct 2017 04:17:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 911AC160BE0; Thu, 26 Oct 2017 02:17:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id ADDE3160BDA for ; Thu, 26 Oct 2017 04:17:06 +0200 (CEST) Received: (qmail 87244 invoked by uid 500); 26 Oct 2017 02:17:05 -0000 Mailing-List: contact issues-help@geode.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@geode.apache.org Delivered-To: mailing list issues@geode.apache.org Received: (qmail 87235 invoked by uid 99); 26 Oct 2017 02:17:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Oct 2017 02:17:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id EE33CC132F for ; Thu, 26 Oct 2017 02:17:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id jor_XxbJK4fD for ; Thu, 26 Oct 2017 02:17:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id E9CC55F477 for ; Thu, 26 Oct 2017 02:17:00 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 69BD1E0288 for ; Thu, 26 Oct 2017 02:17:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 25EAC212F5 for ; Thu, 26 Oct 2017 02:17:00 +0000 (UTC) Date: Thu, 26 Oct 2017 02:17:00 +0000 (UTC) From: "Mangesh Deshmukh (JIRA)" To: issues@geode.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (GEODE-3709) Geode Version: 1.1.1 In one of the project we a... MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 26 Oct 2017 02:17:07 -0000 [ https://issues.apache.org/jira/browse/GEODE-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219856#comment-16219856 ] Mangesh Deshmukh commented on GEODE-3709: ----------------------------------------- I concur with you that there isn't sufficient information(from logs and tcpdump) to tell us where the problem really lies. I had also noticed the window size degradation and that seems to be a common theme in this issue. (Saw it across multiple runs). To troubleshoot further, we could however do a custom build with some log statements. Please let me know what would be appropriate place to put some log statements. Is this the right place? {code:java} Message.java void flushBuffer() throws IOException { final ByteBuffer cb = getCommBuffer(); if (this.socketChannel != null) { cb.flip(); do { *{color:#d04437}this.socketChannel.write(cb);{color}* } while (cb.remaining() > 0); } else { this.outputStream.write(cb.array(), 0, cb.position()); } if (this.messageStats != null) { this.messageStats.incSentBytes(cb.position()); } cb.clear(); } {code} This should give us exact time when the message is being written from application point of view. > Geode Version: 1.1.1 In one of the project we a... > ----------------------------------------------------- > > Key: GEODE-3709 > URL: https://issues.apache.org/jira/browse/GEODE-3709 > Project: Geode > Issue Type: Improvement > Components: client queues > Reporter: Gregory Chase > Attachments: 20171006-logs-stats-tds.zip, 20171020.zip, CacheClientProxyStats_sentBytes.gif, DistributionStats_receivedBytes_CacheClientProxyStats_sentBytes.gif, gf-rest-stats-12-05.gfs, myStatisticsArchiveFile-04-01.gfs > > > Geode Version: 1.1.1 > In one of the project we are using Geode. Here is a summary of how we use it. > - Geode servers have multiple regions. > - Clients subscribe to the data from these regions. > - Clients subscribe interest in all the entries, therefore they get updates about all the entries from creation to modification to deletion. > - One of the regions usually has 5-10 million entries with a TTL of 24 hours. Most entries are added in an hour's span one after other. So when TTL kicks in, they are often destroyed in an hour. > Problem: > Every now and then we observe following message: > Client queue for _gfe_non_durable_client_with_id_x.x.x.x(14229:loner):42754:e4266fc4_2_queue client is full. > This seems to happen when the TTL kicks in on the region with 5-10 million entries. Entries start getting evicted (deleted); the updates (destroys) now must be sent to clients. We see that the updates do happen for a while but suddenly the updates stop and the queue size starts growing. This is becoming a major issue for smooth functioning of our production setup. Any help will be much appreciated. > I did some ground work by downloading and looking at the code. I see reference to 2 issues #37581, #51400. But I am unable to view actual JIRA tickets (needs login credentials) Hopefully, it helps someone looking at the issue. > Here is the pertinent code: > @Override > @edu.umd.cs.findbugs.annotations.SuppressWarnings("TLW_TWO_LOCK_WAIT") > void checkQueueSizeConstraint() throws InterruptedException { > if (this.haContainer instanceof HAContainerMap && isPrimary()) { // Fix for bug 39413 > if (Thread.interrupted()) > throw new InterruptedException(); > synchronized (this.putGuard) { > if (putPermits <= 0) { > synchronized (this.permitMon) { > if (reconcilePutPermits() <= 0) { > if (region.getSystem().getConfig().getRemoveUnresponsiveClient()) { > isClientSlowReciever = true; > } else { > try { > long logFrequency = CacheClientNotifier.DEFAULT_LOG_FREQUENCY; > CacheClientNotifier ccn = CacheClientNotifier.getInstance(); > if (ccn != null) { // check needed for junit tests > logFrequency = ccn.getLogFrequency(); > } > if ((this.maxQueueSizeHitCount % logFrequency) == 0) { > logger.warn(LocalizedMessage.create( > LocalizedStrings.HARegionQueue_CLIENT_QUEUE_FOR_0_IS_FULL, > new Object[] {region.getName()})); > this.maxQueueSizeHitCount = 0; > } > ++this.maxQueueSizeHitCount; > this.region.checkReadiness(); // fix for bug 37581 > // TODO: wait called while holding two locks > this.permitMon.wait(CacheClientNotifier.eventEnqueueWaitTime); > this.region.checkReadiness(); // fix for bug 37581 > // Fix for #51400. Allow the queue to grow beyond its > // capacity/maxQueueSize, if it is taking a long time to > // drain the queue, either due to a slower client or the > // deadlock scenario mentioned in the ticket. > reconcilePutPermits(); > if ((this.maxQueueSizeHitCount % logFrequency) == 1) { > logger.info(LocalizedMessage > .create(LocalizedStrings.HARegionQueue_RESUMING_WITH_PROCESSING_PUTS)); > } > } catch (InterruptedException ex) { > // TODO: The line below is meaningless. Comment it out later > this.permitMon.notifyAll(); > throw ex; > } > } > } > } // synchronized (this.permitMon) > } // if (putPermits <= 0) > --putPermits; > } // synchronized (this.putGuard) > } > } > *Reporter*: Mangesh Deshmukh > *E-mail*: [mailto:mdeshmukh@quotient.com] -- This message was sent by Atlassian JIRA (v6.4.14#64029)