Return-Path: X-Original-To: apmail-accumulo-dev-archive@www.apache.org Delivered-To: apmail-accumulo-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0B66C17C20 for ; Fri, 7 Nov 2014 00:29:59 +0000 (UTC) Received: (qmail 77868 invoked by uid 500); 7 Nov 2014 00:29:58 -0000 Delivered-To: apmail-accumulo-dev-archive@accumulo.apache.org Received: (qmail 77824 invoked by uid 500); 7 Nov 2014 00:29:58 -0000 Mailing-List: contact dev-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list dev@accumulo.apache.org Received: (qmail 77812 invoked by uid 99); 7 Nov 2014 00:29:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Nov 2014 00:29:58 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of eric.newton@gmail.com designates 209.85.220.174 as permitted sender) Received: from [209.85.220.174] (HELO mail-vc0-f174.google.com) (209.85.220.174) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Nov 2014 00:29:31 +0000 Received: by mail-vc0-f174.google.com with SMTP id im17so1209954vcb.19 for ; Thu, 06 Nov 2014 16:29:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=cYPmpIbk7j5DmVO1TgXZ3uTMAAN5Bv23Jio1b9z8n/0=; b=IarQLO3noVqAc1YMaN3+N0A+4nVj0KPHhO4r0EeWsMGg9k/RsTt9zDf5gNByoHB1l6 GlqRpxH5mTdbFq7tcgc5YspIW3NfhZLwkvw7TVoekkoa0aSl1RKzaVzoE9btwgJhXvPx N1vh8cIzF1LBzRootqNnbR3JtwIqvJs/Rox/BJeMxd+ApGzmfhk2UkXCnWtWzohFsZ1c 5hfh8J+A7lBzZNn2x9EZo//2eCR3hfnULmx0rpnYgkqnuJQrYifT1qfmbySPrkPGopTu 4kku947IV1uN7BawW4Hb+nayhNyUTY8eI09GM90/R0FbqUl2vlxfTlNZ+DlA2T6u4r4M 0sBw== MIME-Version: 1.0 X-Received: by 10.220.81.196 with SMTP id y4mr5125492vck.25.1415320169714; Thu, 06 Nov 2014 16:29:29 -0800 (PST) Received: by 10.31.134.1 with HTTP; Thu, 6 Nov 2014 16:29:29 -0800 (PST) In-Reply-To: <20141106204758.10455.86555@reviews.apache.org> References: <20141106174751.10454.96318@reviews.apache.org> <20141106204758.10455.86555@reviews.apache.org> Date: Thu, 6 Nov 2014 19:29:29 -0500 Message-ID: Subject: Re: Review Request 27654: Add introspection of long running assignments From: Eric Newton To: dev@accumulo.apache.org, Josh Elser Cc: "R. Keith Turner" Content-Type: multipart/alternative; boundary=001a11c1f022f59a68050739e79a X-Virus-Checked: Checked by ClamAV on apache.org --001a11c1f022f59a68050739e79a Content-Type: text/plain; charset=UTF-8 It would be nice to model "Danger!" messages with "All Clear!" directly. I'll make a ticket. On Thu, Nov 6, 2014 at 3:47 PM, Josh Elser wrote: > > > > On Nov. 6, 2014, 5:47 p.m., kturner wrote: > > > > server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServerResourceManager.java, > line 250 > > > < > https://reviews.apache.org/r/27654/diff/3/?file=751140#file751140line250> > > > > > > The compaction code remembers when it logged an exception and does > not do it again. It also logs a message if the compaction becomes > unstuck. An advantage I thought of w/ repeatedly logging, is that you > could see the stack trace changing (or not). > > > > > > > > > The stack trace is a possible trace. By the time logging > happens, the assignment could have completed and the thread could have > moved on to other things. > > > > Josh Elser wrote: > > Yeah, since these are running fairly regularly (order of seconds) a > stuck assignment could get really spammy. Like you point out, there could > be value gained from printing out the stack more than once. Maybe I could > add some backoff which only warns so often? > > > > bq. By the time logging happens, the assignment could have completed > and the thread could have moved on to other things. > > > > Do you think the message should be updated to be more clear about > this? A "Maybe you should look into this" type message? > > > > kturner wrote: > > > a stuck assignment could get really spammy > > > > I think that spam is probably ok as long as the default is high > enough such that when it does happen, its something to be concerned about. > Could make the timer check a little less frequently. > > > > > Do you think the message should be updated to be more clear about > this? > > > > I think compaction code just says its a possible stack trace. I > suppose a good solution would be to have error codes, then user can look up > error code and get nitty gritty details. Can't really put too much info in > log message. > > > > Josh Elser wrote: > > bq. Could make the timer check a little less frequently. > > > > As long as we have a long threshold for warning about a stuck > assignment, we can easily make a longer period on the timer. The timer > period dictates the minimum stuck assignment time -- I can update the > description with a clarification. > > > > kturner wrote: > > I was thinking that once an assignment is considered stuck, that > each time the timer kicks a check (I think its either 5 secs or 1 sec, not > sure) that it will cause a spam. Was thinking this could be increased to > produce less spam. The period of the timer could be a function of > tserver.assignment.duration.warning, like 1/4 or 1/2. > > bq. The period of the timer could be a function of > tserver.assignment.duration.warning, like 1/4 or 1/2. > > That would work, unless the user changed the value of the duration > warning. It would still fire at the old period (unless I'm much trickier > about scheduling the task to run). > > Regardless need to think some more about preventing spam. > > > - Josh > > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/27654/#review60185 > ----------------------------------------------------------- > > > On Nov. 6, 2014, 12:58 a.m., Josh Elser wrote: > > > > ----------------------------------------------------------- > > This is an automatically generated e-mail. To reply, visit: > > https://reviews.apache.org/r/27654/ > > ----------------------------------------------------------- > > > > (Updated Nov. 6, 2014, 12:58 a.m.) > > > > > > Review request for accumulo. > > > > > > Bugs: ACCUMULO-3304 > > https://issues.apache.org/jira/browse/ACCUMULO-3304 > > > > > > Repository: accumulo > > > > > > Description > > ------- > > > > Watches assignments and reports when an assignment is running for longer > than a configured time. > > > > > > Diffs > > ----- > > > > core/src/main/java/org/apache/accumulo/core/conf/Property.java 56f3d9c > > > server/tserver/src/main/java/org/apache/accumulo/tserver/ActiveAssignmentRunnable.java > PRE-CREATION > > > server/tserver/src/main/java/org/apache/accumulo/tserver/RunnableStartedAt.java > PRE-CREATION > > > server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java > 94be0bb > > > server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServerResourceManager.java > 935ffeb > > > > Diff: https://reviews.apache.org/r/27654/diff/ > > > > > > Testing > > ------- > > > > Very minimal. > > > > > > Thanks, > > > > Josh Elser > > > > > > --001a11c1f022f59a68050739e79a--