Return-Path: X-Original-To: apmail-accumulo-dev-archive@www.apache.org Delivered-To: apmail-accumulo-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 79EFF11854 for ; Sat, 19 Apr 2014 02:47:29 +0000 (UTC) Received: (qmail 61675 invoked by uid 500); 19 Apr 2014 02:47:28 -0000 Delivered-To: apmail-accumulo-dev-archive@accumulo.apache.org Received: (qmail 61431 invoked by uid 500); 19 Apr 2014 02:47:28 -0000 Mailing-List: contact dev-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list dev@accumulo.apache.org Received: (qmail 61422 invoked by uid 99); 19 Apr 2014 02:47:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Apr 2014 02:47:28 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of busbey@cloudera.com designates 209.85.192.45 as permitted sender) Received: from [209.85.192.45] (HELO mail-qg0-f45.google.com) (209.85.192.45) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Apr 2014 02:47:23 +0000 Received: by mail-qg0-f45.google.com with SMTP id a108so2269975qge.18 for ; Fri, 18 Apr 2014 19:47:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=EzHTxQa2dFCJdWe4iVr5we2hOmmyqeS9FNrprn1HOUo=; b=XK0BnbRhxfrqpa3lYmNAp1tB1I1pNzZJbTVlHXBOi4fbe+Bm2GOILiidmT7Au9IqAi H5wIALjmyu2cTwLq0PtrwcnjdhSzbCjdkYUIuVaIAyMLdhcEfUoZE+9wQhfWQRYHGFW8 aT+DjF//otFY1YIQVY+w4fXqEnE7FBgFacchqSKNkFCDA6f/7z4lNLNYef2ntUf4WMZ9 0VMwtCGAv6lMSye5+KmFgVS1e12uFcJdquxvUV6WDT/WiNl/ikaiOtvPwLCGTJ4mESt/ abNDSYMX+oBqrxnmXsVOs6ZahQRFQ7sZRpjg9Bxd/suMdS1dwJYkUVirW3kvFu9pSvve XQmA== X-Gm-Message-State: ALoCoQlIlxGObDnfCmoImXt/cEydPFkf13QbPfIHyUFgvKE1ngEfpL5omuvbOYym6WKBwL+OsffP X-Received: by 10.229.192.7 with SMTP id do7mr23759764qcb.1.1397875622996; Fri, 18 Apr 2014 19:47:02 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.37.71 with HTTP; Fri, 18 Apr 2014 19:46:42 -0700 (PDT) In-Reply-To: References: From: Sean Busbey Date: Fri, 18 Apr 2014 21:46:42 -0500 Message-ID: Subject: Re: increasing balancing problems to WARN To: "dev@accumulo apache. org" Content-Type: multipart/alternative; boundary=001a11337d4af31de504f75c4781 X-Virus-Checked: Checked by ClamAV on apache.org --001a11337d4af31de504f75c4781 Content-Type: text/plain; charset=UTF-8 I also try to limit what goes at higher warning levels. One of my goals over hte next few months is to improve our current logging. It sounds like this is a good time to make sure we're on the same page. We're going to have to train users on something (esp since our currently logging is very noisy). The short version I like is "Info and more severe are for operators; info and less severe are for developers." Here's what I usually use as a guideline (constrained to slf4j levels): = ERROR Something is wrong and an operator needs to do something, preferably very soon. In other words, if I was on call I'd expect to get paged. = WARN Something is amiss, but not of immediate concern. An operator who is on call but not busy at the moment might want to investigate some kind of underlying issue, but the system will continue to function within some reasonable bound. = INFO Summary information about normal operations that is safe to ignore. GC information, throughput stats, that kind of thing. = DEBUG Low level information that is not normally useful, but will help determine the cause of a system malfunction. Usually something a developer or tier 3 supporter would want when something was going wrong (e.g. stack traces). = TRACE Detailed low level information at a volume that probably can't be gathered in production. Eric, do those all sound reasonable? I want to make sure we have a common basis before I get into the specifics of this case. -Sean On Fri, Apr 18, 2014 at 8:21 PM, Eric Newton wrote: > -1 > > I would hesitate to put *any* message at WARN. It is normal for balancing > to take a little while, especially for some of my users who have their own > balancing algorithm. > > Users feel the need to fix the problem; after all, it's there in big scary > yellow on the monitor page. I don't like training users to ignore scary > yellow. Is it a problem, or not? > > Alternatively, put the balance info into the master status, and display it. > Like GC collection time... hey, I've been migrating these tablets for a > long time... turn yellow/red. > > -Eric > > > > > On Fri, Apr 18, 2014 at 4:03 PM, Sean Busbey wrote: > > > At the moment all of our logs about problems balancing are at DEBUG. > > > > Given the impact to a cluster when this happens (skewing load onto few > > servers, in some case severely), I'd like to raise it to WARN so that it > > surfaces for operators in the Monitor and in the non-debug log. > > > > Thought I'd do a quick lazy consensus check before filing a jira and > taking > > care of it. > > > > -- > > Sean > > > -- Sean --001a11337d4af31de504f75c4781--