lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-11934) Visit Solr logging, it's too noisy.
Date Sat, 05 May 2018 02:32:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-11934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464575#comment-16464575
] 

Erick Erickson commented on SOLR-11934:
---------------------------------------

I looked at sample logs from a client and wrote a little program to bucket them. NOTE: this
was an index-heavy type application. Here are the number of log messages originating in these
files:

        35 occurrences in org.apache.solr.common.cloud.ZkStateReader$StateWatcher;
        48 occurrences in org.apache.solr.core.QuerySenderListener;
        55 occurrences in org.apache.solr.servlet.HttpSolrCall;
     64388 occurrences in org.apache.solr.update.SolrIndexWriter;
     64413 occurrences in org.apache.solr.search.SolrIndexSearcher;
     64601 occurrences in org.apache.solr.core.SolrCore;
    128795 occurrences in org.apache.solr.update.DirectUpdateHandler2;
   3345415 occurrences in org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor;

that got me to thinking about the _value_ of certain messages when troubleshooting. Is the
value in the individual messages or in getting a sense of how things are behaving?

So WDYT about logging some summary information instead of individual messages for high-cardinality
messages? Rather than logging every update at info level, would it be "useful enough" when
troubleshooting to log a summary of updates' vital statistics every N seconds?

Here's what I'm thinking. Pretty often when troubleshooting, we want to correlate heavy request
activity for Solr to something else (GC, slowdowns, recoveries, whatever). Using updates as
an example, what if at info level, we logged something like

INFO  - 2018-05-04 03:35:09.206 over the last T seconds there were N update requests updating
D documents taking min/avg/median/max milliseconds to complete with min/average/median/max
documents per request.

With the new metrics capabilities, that information may be available already, along with lots
of other information (percentiles and the like).

If seeing the underlying individual requests were important, one could turn on finer-grained
logging for this class only. We could even supply commented-out ways to enable voluminous
messages for anything we choose to write summary information out in the log4j2 conifgs.

I totally understand that for gnarly problems you want (and need) voluminous output. I totally
understand that sometimes you can't predict that up front. I totally understand that in some
cases this would introduce an extra round-trip, "you have to enable this option in your logging
configuration, wait for it to happen again and call me".

I also totally understand that in large Solr operations, the log files are very, very hard
to use because there's so much information being dumped by potentially hundreds of Solr instances.

I also totally understand that I've spend more time than I like to remember writing programs
to parse log files to give me summary information like the above to know where to start looking
for problems. Why not have Solr do it?

This example is just for updates, but queries could do something very similar along with whatever
else fits this pattern. Queries certainly come to mind.

The question I'm really asking is if we're logging intelligently or not. Basically we're throwing
everything that anyone _might_ be interested into a log file and then requiring the users
(and consultants) to figure out how to make it useful. We know enough about how this information
is used I think to do some of it up front, always with the option to drop back to the finer-grained
information when necessary.

Anyway, what do people think?

> Visit Solr logging, it's too noisy.
> -----------------------------------
>
>                 Key: SOLR-11934
>                 URL: https://issues.apache.org/jira/browse/SOLR-11934
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Major
>
> I think we have way too much INFO level logging. Or, perhaps more correctly, Solr logging
needs to be examined and messages logged at an appropriate level.
> We log every update at an INFO level for instance. But I think we log LIR at INFO as
well. As a sysadmin I don't care to have my logs polluted with a message for every update,
but if I'm trying to keep my system healthy I want to see LIR messages and try to understand
why.
> Plus, in large installations logging at INFO level is creating a _LOT_ of files.
> What I want to discuss on this JIRA is
> 1> What kinds of messages do we want log at WARN, INFO, DEBUG, and TRACE levels?
> 2> Who's the audience at each level? For a running system that's functioning, sysops
folks would really like WARN messages that mean something need attention for instance. If
I'm troubleshooting should I turn on INFO? DEBUG? TRACE?
> So let's say we get some kind of agreement as to the above. Then I propose three things
> 1> Someone (and probably me but all help gratefully accepted) needs to go through
our logging and assign appropriate levels. This will take quite a while, I intend to work
on it in small chunks.
> 2> Actually answer whether unnecessary objects are created when something like log.info("whatever
{}", someObjectOrMethodCall); is invoked. Is this independent on the logging implementation
used? The SLF4J and log4j seem a bit contradictory.
> 3> Maybe regularize log, logger, LOG as variable names, but that's a nit.
> As a tactical approach, I suggest we tag each LoggerFactory.getLogger in files we work
on with //SOLR-(whatever number is assigned when I create this). We can remove them all later,
but since I expect to approach this piecemeal it'd be nice to keep track of which files have
been done already.
> Finally, I really really really don't want to do this all at once. There are 5-6 thousand
log messages. Even at 1,000 a week that's 6 weeks, even starting now it would probably span
the 7.3 release.
> This will probably be an umbrella issue so we can keep all the commits straight and people
can volunteer to "fix the files in core" as a separate piece of work (hint).
> There are several existing JIRAs about logging in general, let's link them in here as
well.
> Let the discussion begin!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message