Return-Path: X-Original-To: apmail-uima-user-archive@www.apache.org Delivered-To: apmail-uima-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E3A2710E6D for ; Tue, 10 Feb 2015 14:29:05 +0000 (UTC) Received: (qmail 83957 invoked by uid 500); 10 Feb 2015 14:28:41 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 83915 invoked by uid 500); 10 Feb 2015 14:28:41 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 83903 invoked by uid 99); 10 Feb 2015 14:28:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Feb 2015 14:28:41 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of uimaee@gmail.com designates 74.125.82.53 as permitted sender) Received: from [74.125.82.53] (HELO mail-wg0-f53.google.com) (74.125.82.53) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Feb 2015 14:28:16 +0000 Received: by mail-wg0-f53.google.com with SMTP id x13so13801805wgg.12 for ; Tue, 10 Feb 2015 06:25:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=o8FgR//xDw9DaCDznwnYJrNDRAqWYLUVAEsxxgswFos=; b=Li53QfrtJs05o7hyPFpUA11XmmiwgtwfLP5CjDzQJ/X9V23ve+99H2zpapVtH7hKch YolrncsFmkJCG4hm8+D28n6U28uaarow9mtadEZMC4LidSzRzusRz6v+FkGML6hfovfF HWcJDaAQUGe7dTdrZHX2oNCWbhOMlFC3GnvfHHFJPICSeu3nt59NCrn5GRUI1cJ9pba1 dhK49RTXcafWxzrIdqNaEuUFv78q/dZREymleKFYw7hcGVYxJ9b0sa7y5rY6taW/AmrI 6c1LKOBF7z+Ejtr6yQG8OMejpy5btDH/GfEjhn1Z+OKu1k9o4C49YHgJk1y29c/qDVYN qOxA== MIME-Version: 1.0 X-Received: by 10.180.74.8 with SMTP id p8mr45523746wiv.61.1423578359652; Tue, 10 Feb 2015 06:25:59 -0800 (PST) Received: by 10.27.15.203 with HTTP; Tue, 10 Feb 2015 06:25:59 -0800 (PST) In-Reply-To: <54D9F45E.7070002@orkash.com> References: <54D9F45E.7070002@orkash.com> Date: Tue, 10 Feb 2015 09:25:59 -0500 Message-ID: Subject: Re: DUCC- Heartbeat Packets? From: Jaroslaw Cwiklik To: user@uima.apache.org Content-Type: multipart/alternative; boundary=f46d043c7b9e6fe404050ebcaab6 X-Virus-Checked: Checked by ClamAV on apache.org --f46d043c7b9e6fe404050ebcaab6 Content-Type: text/plain; charset=UTF-8 1. What are Heartbeat Packets? Ducc Agent publishes node metrics at regular intervals. The information included is node identification, OS info, memory, etc. This is consumed by the RM and WS. If the RM stops seeing publication from a node within a configurable window, it will mark the node as down. Status of all nodes is available in the Ducc Monitor. 2. Are they same as defined in this url: http://250bpm.com/blog:22. Nope. 3. How daemons broadcast a heartbeat? Agent publishes node metrics to a well known JMS topic 4. How Agents nodes send heartbeat packets? See #3 On Tue, Feb 10, 2015 at 7:06 AM, reshu.agarwal wrote: > Hi, > > I read in DUCC book about: > > Agents monitors nodes, sending heartbeat packets with node statistics to > interested components (such as the RM and web-server). > > Status > > This shows the current state of a machine. Values include: > > defined > The node is in the DUCCnodes file > , > but no DUCC process has been started there, or else there is a > communication problem and the state messages are not being > delivered. > up > The node has a DUCC Agent process running on it and the web > server is receiving regular heartbeat packets from it. > down > The node had a healthy DUCC Agent on it at some point in the > past (since the last DUCC boot), but the web server has stopped > receiving heartbeats from it. > > The agent may have been manually shut down, may have crashed, or > there may be a communication problem. > > Additionally, very heavy loads from jobs running the the node > can cause the DUCC Agents heartbeats to be delayed. > > I have some question in my mind i.e. > > 1. What are Heartbeat Packets? > 2. Are they same as defined in this url: http://250bpm.com/blog:22. > 3. How daemons broadcast a heartbeat? > 4. How Agents nodes send heartbeat packets? > > As My DUCC Agents were going down again and again for a particular time > period. > > 5. How can I identify Agents were going down due to network issue? > > Thanks in Advanced. > > Reshu. > --f46d043c7b9e6fe404050ebcaab6--