From dev-return-28030-archive-asf-public=cust-asf.ponee.io@geode.apache.org Wed Feb 21 19:54:14 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 4417B18061A for ; Wed, 21 Feb 2018 19:54:13 +0100 (CET) Received: (qmail 28043 invoked by uid 500); 21 Feb 2018 18:54:07 -0000 Mailing-List: contact dev-help@geode.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@geode.apache.org Delivered-To: mailing list dev@geode.apache.org Received: (qmail 28013 invoked by uid 99); 21 Feb 2018 18:54:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Feb 2018 18:54:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 1A16C180164 for ; Wed, 21 Feb 2018 18:54:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.999 X-Spam-Level: * X-Spam-Status: No, score=1.999 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=pivotal-io.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id rxcY8IW1BcOW for ; Wed, 21 Feb 2018 18:54:04 +0000 (UTC) Received: from mail-io0-f171.google.com (mail-io0-f171.google.com [209.85.223.171]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 295215F47E for ; Wed, 21 Feb 2018 18:54:02 +0000 (UTC) Received: by mail-io0-f171.google.com with SMTP id g21so3219736ioj.5 for ; Wed, 21 Feb 2018 10:54:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pivotal-io.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=DsP4LS411iEM7qUh9p/vsDiTk74PZPWWXccO5ohclxk=; b=HsXRp4XAzTR6sreS6u9+2Oo1UgakMHcjr4uZ84L+eUkNyE0qy2BenBPTr/qBzcuWsi wMBNyHxPazRXZlh46uPZqTGR18a/Q8TOet+xNyP68Ycqyo3DkoYwpy+yuY429WtoLXce F5Xk4JhTrehtz03uxWTB/s7hY1FjM7P+zgXiYkxlr8aBL4NOccdiAEjOIjpVJZcXhT22 ie3FHRzuhwo8ltgwHHn/d8PVyInJWgGwyHeg/s9JnOblHlrA0X2zMxZikBJfd66PP7DN hPfD9nJnm0voqu8o3rppuVjPgp+ZI5sFqsAMX9CYScSMJ7KqvJyNNmzk1fD/vQUR+jMK MH/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DsP4LS411iEM7qUh9p/vsDiTk74PZPWWXccO5ohclxk=; b=SoPmAAXp747LMYkA5w02svYunV35k5wtQlWLiZFXD5T/BF2V4yB3DPBpNywKjSvi19 zrMnLKR5GV28o2yXH2C54a+izQvzmSjZa5voj77Xg4yRLJZDoWwt79J2NPkqRcWxMVka Svw0OIGPG3RUo68xBPjvPwjjPOp8A/Rkgulx87WBpG3HB99YuWgW9Tjq0GmlATn3omoL En6GY8Sys4sAwBw2P04hR3IszD0au45VBzgu0PasU5NAdJdeywZ6XdC6pXrOuNl8LQk9 kw0B0gv+Bj/mMdGnJAwIENE1RUGGhF6UdrNHIxDHFEAdyHQWbnKWAHi1ktP4/OTG0dG2 6DMw== X-Gm-Message-State: APf1xPDDETjaJWNOFrmp9wbKFcqCS/vToof9zmRfwgpzwCm5D0D9Jg/1 zJUOFzsDSj6synLOMudK1PkE7nEUpb1xpgyXjNWiGA== X-Google-Smtp-Source: AG47ELsUjzhemFjZ9nv7vobpN7OHFRZFZHMKqeeaRl64xlJ3AIlqhNtGevy2xTunV7GajYkU8yrGOFvXR9eT59frv/A= X-Received: by 10.107.22.1 with SMTP id 1mr5659142iow.238.1519239240855; Wed, 21 Feb 2018 10:54:00 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Jason Huynh Date: Wed, 21 Feb 2018 18:53:50 +0000 Message-ID: Subject: Re: [Proposal] Thread monitoring mechanism To: dev@geode.apache.org Cc: "user@geode.apache.org" Content-Type: multipart/alternative; boundary="94eb2c05e8724800e00565bd7374" --94eb2c05e8724800e00565bd7374 Content-Type: text/plain; charset="UTF-8" I am assuming this would be for all thread/thread pools and not specific to Function threads. I wonder what the impact would be for put/get operations or are we going to target specific operations. On Tue, Feb 20, 2018 at 1:04 AM Gregory Vortman wrote: > Hello team, > One of the most severe issues hitting our real time application is thread > stuck for multiple reasons, such as long lasting locks, deadlocks, threads > which wait for reply forever in case of packet drop issue etc... > Such kind of stuck are under Radar of the existing system health check > methods. > In mission critical applications, this will be resulted as an immediate > outage. > > As a short we are implementing kind of internal watch dog mechanism for > stuck detector: > There is a registration object > Function executor having start/end hooks to > register/unregister the thread via the registration object > Customized Monitoring scheduled thread is spawned on startup. The thread > to wake up every N seconds, to scan the registration map and to detect > unregistered threads for a long time (configurable). > Once such threads has been detected, process stack is taken and thread > stack statistic metric is provided. > > This helps us to monitor, detect and take fast decision about the action > which should be taken - usually it is member bounce decision (consistency > issue is possible, in our case it is better than deny of service). > The above solution is not touching GEODE core code, but implemented in > boundaries of customized code only. > > I would like to raise a proposal to introduce a long term generic thread > monitoring mechanism, to detect threads which are stuck for any reason. > To maintain a monitoring object having a start/end methods to be invoked > similarly to FunctionStats.startFunctionExecution and > FunctionStats.endFunctionExecution. > > Your feedback would be appreciated > > Thank you for cooperation. > Best regards! > > Gregory Vortman > > This message and the information contained herein is proprietary and > confidential and subject to the Amdocs policy statement, > > you may review at https://www.amdocs.com/about/email-disclaimer < > https://www.amdocs.com/about/email-disclaimer> > --94eb2c05e8724800e00565bd7374--