Mailing-List: contact dev-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@accumulo.apache.org
Received-SPF: pass (athena.apache.org: domain of busbey@cloudera.com
 designates 209.85.192.45 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CADxc9B=cv=0y=N1QBNE-Mi1Rm1PTDoU7_nTUAA8GTwL=QTDTcA@mail.gmail.com>
References: 
 <CAGHyZ6Ly5=VVLgtXnubdE68678Ph-y4QgVGi1=c8tGNXMa-PFQ@mail.gmail.com>
 <CADxc9B=cv=0y=N1QBNE-Mi1Rm1PTDoU7_nTUAA8GTwL=QTDTcA@mail.gmail.com>
From: Sean Busbey <busbey@cloudera.com>
Date: Fri, 18 Apr 2014 21:46:42 -0500
Message-ID: 
 <CAGHyZ6LjkxP+fO28-1V-OF1DAT9h3h8SwxKqz7da6Vx=0d=0UA@mail.gmail.com>
Subject: Re: increasing balancing problems to WARN
To: "dev@accumulo apache. org" <dev@accumulo.apache.org>
Content-Type: multipart/alternative; boundary=001a11337d4af31de504f75c4781

--001a11337d4af31de504f75c4781
Content-Type: text/plain; charset=UTF-8

I also try to limit what goes at higher warning levels.  One of my goals
over hte next few months is to improve our current logging. It sounds like
this is a good time to make sure we're on the same page.

We're going to have to train users on something (esp since our currently
logging is very noisy). The short version I like is "Info and more severe
are for operators; info and less severe are for developers."

Here's what I usually use as a guideline (constrained to slf4j levels):


= ERROR

Something is wrong and an operator needs to do something, preferably very
soon. In other words, if I was on call I'd expect to get paged.

= WARN

Something is amiss, but not of immediate concern. An operator who is on
call but not busy at the moment might want to investigate some kind of
underlying issue, but the system will continue to function within some
reasonable bound.

= INFO

Summary information about normal operations that is safe to ignore. GC
information, throughput stats, that kind of thing.

= DEBUG

Low level information that is not normally useful, but will help determine
the cause of a system malfunction. Usually something a developer or tier 3
supporter would want when something was going wrong (e.g. stack traces).

= TRACE

Detailed low level information at a volume that probably can't be gathered
in production.


Eric, do those all sound reasonable? I want to make sure we have a common
basis before I get into the specifics of this case.

-Sean

On Fri, Apr 18, 2014 at 8:21 PM, Eric Newton <eric.newton@gmail.com> wrote:

> -1
>
> I would hesitate to put *any* message at WARN. It is normal for balancing
> to take a little while, especially for some of my users who have their own
> balancing algorithm.
>
> Users feel the need to fix the problem; after all, it's there in big scary
> yellow on the monitor page.   I don't like training users to ignore scary
> yellow.  Is it a problem, or not?
>
> Alternatively, put the balance info into the master status, and display it.
>  Like GC collection time... hey, I've been migrating these tablets for a
> long time... turn yellow/red.
>
> -Eric
>
>
>
>
> On Fri, Apr 18, 2014 at 4:03 PM, Sean Busbey <busbey@cloudera.com> wrote:
>
> > At the moment all of our logs about problems balancing are at DEBUG.
> >
> > Given the impact to a cluster when this happens (skewing load onto few
> > servers, in some case severely), I'd like to raise it to WARN so that it
> > surfaces for operators in the Monitor and in the non-debug log.
> >
> > Thought I'd do a quick lazy consensus check before filing a jira and
> taking
> > care of it.
> >
> > --
> > Sean
> >
>


-- 
Sean

--001a11337d4af31de504f75c4781--