Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id BAA0E200CFE for ; Fri, 8 Sep 2017 13:57:44 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B8FA5160C7F; Fri, 8 Sep 2017 11:57:44 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0A12A160C62 for ; Fri, 8 Sep 2017 13:57:43 +0200 (CEST) Received: (qmail 44171 invoked by uid 500); 8 Sep 2017 11:57:41 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 44159 invoked by uid 99); 8 Sep 2017 11:57:41 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Sep 2017 11:57:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id F36AD1A7741 for ; Fri, 8 Sep 2017 11:57:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.379 X-Spam-Level: X-Spam-Status: No, score=0.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id zSRmIQ6odgcx for ; Fri, 8 Sep 2017 11:57:39 +0000 (UTC) Received: from mail-lf0-f54.google.com (mail-lf0-f54.google.com [209.85.215.54]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 07D265FDBF for ; Fri, 8 Sep 2017 11:57:39 +0000 (UTC) Received: by mail-lf0-f54.google.com with SMTP id d17so5150187lfe.2 for ; Fri, 08 Sep 2017 04:57:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:subject:to:references:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=WKeX9yv2YgXAN7uARjxxaWzKAeDqe8J+ED+gF0FAjPk=; b=YiaTLaKifzoKmnc2JTUylPrATaHtovO0+C9SMukpjD08b/Z/5MdF3HrZ8sQFQ9mzrv 4R6Gd0FKyc1nXSrei4jF3i2Trf+irKpRWbSuJFBZtrCpZ04qApPsdiR46/EZSyRERorN reuvKLjLPnEm1cUQTq3/zD1AbF+EmmwGozXL9VjAOovSPsPbfOQLfZopgV/ToqDxp4pU J5mmuS4u0IClTBjCxLChXx4uEfKgTiVd4HTNWOq6J0YDzoMQtVFekpIAPom2ze87dEx0 EYDG8vRZzLJ56s7KzTaxiC/+dFTIZfzaLqUxBJjf6Fxyl0/nGGPcMDOo43img+XLXyhJ RqXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=WKeX9yv2YgXAN7uARjxxaWzKAeDqe8J+ED+gF0FAjPk=; b=QOGqNwyQRmVQvEQyGmj4+Q4OhhZJxAcrSeaBQYtdMSLOY5/1KcrCTdrEalDG88p+63 wNR+wm8K8Dz8oT7m1re3bAb6hGulOJ6quqEuIcRUr9YvdTtVopnWfrvsC9MAy1qMIfRn R6fOXi/kbtmxmvV8UX+h8a1URHURX2E4vVHlv4sq0P7i6OyR5ZdD0K2RRKS8YWeb0Y3O FYZ/FMm+lSwjTvzHlvkRYSNA15Cq+gXkpB6M8AE3CH3L0i1N2t2HYFCgZR9ExW4HvVGC 5KtEWUydhQov2Fuozk5s3XiTOblGBvuCoGIlXLkNAPMyAg6vF7v86kQbFEAm7Nk/kLyW YBuA== X-Gm-Message-State: AHPjjUj/zXVHwXVCbObuq5S7Mehm/0z1QW7f2amFWApVmA+/uvbwl5Gn DxyC2c5+5X+JgOUUIVA= X-Google-Smtp-Source: AOwi7QC8lHhBmeDHRVzzJzF4ZyorAdyDLfVZ6B8avWxxfLsPpPKuiJBLkO6a5TTPEhws9Nj1A/Rwag== X-Received: by 10.46.83.93 with SMTP id t29mr1022748ljd.2.1504871858227; Fri, 08 Sep 2017 04:57:38 -0700 (PDT) Received: from [192.168.1.3] ([91.210.95.4]) by smtp.gmail.com with ESMTPSA id f199sm284995lfg.85.2017.09.08.04.57.37 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 08 Sep 2017 04:57:37 -0700 (PDT) From: Nikolay Izhikov X-Google-Original-From: Nikolay Izhikov Subject: Re: Monitoring of active transactions To: dev@ignite.apache.org References: Message-ID: <4cabb8f1-95f7-3059-459b-c8453d85e07a@gmail.com> Date: Fri, 8 Sep 2017 14:57:36 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit archived-at: Fri, 08 Sep 2017 11:57:44 -0000 Hello, Ilya. Great! Thanks! Can I extend your idea a bit? I think it be very useful monitor all-user provided listeners and callbacks also. To handle following scenarios: 1а. User starts a job with ExecutorService, IgniteCompute or similar. 1b. User creates ContinuousQuery with remoteFilter and localListener. 2. It take huge amount of time to execute user callback on some node. Or thread blocks on some monitor inside callback. In that case Ignite can detect it and print some waring message. We can cancel user callback to free resources in some cases. Specific timeouts and cancel policy should be configured by user. May be it already covered by FailOverSpi [1] but I can't find description of such feature. We can take WebSphere hangs detection mechanism [2], [3] as an example. [1] https://apacheignite.readme.io/docs/fault-tolerance [2] https://www.ibm.com/developerworks/community/blogs/aimsupport/entry/hung_thread_detection_in_websphere_application_server?lang=en [3] https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.iseries.doc/ae/ttrb_confighangdet.html 08.09.2017 14:27, Ilya Lantukh пишет: > Igniters, > > According to our current design and implementation, unclosed transaction or > unreleased lock can hang ignite cluster forever. This is logical, and with > correct usage of those mechanics such issue should never happen, in real > world developers can make mistakes and leave transaction open. We have a > feature "transaction timeout", but turns out it doesn't work in all cases > (see https://issues.apache.org/jira/browse/IGNITE-6181). Even if all known > issues are fixed, there is still a lot of room for mistake and incorrect > usage. > > To make it possible for Ignite users to discover such problem and trace it > to a particular part of code, I've created a very simple utility that > collects and prints information about long running transactions for the > whole cluster. It is available here: > https://github.com/ilantukh/IgniteTxViewer. > > One might expect such monitoring utilities to be included in Ignite > codebase. Personally, I think that such information should be available > from public API, without using of additional applications or diving into > Ignite internals. > > What do you think? >