spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Brown <...@mult.ifario.us>
Subject Re: 0.9.0 forces log4j usage
Date Sat, 08 Feb 2014 01:57:26 GMT
Hi, Patrick --

I forget which other component is responsible, but we're using the
log4j-over-slf4j as part of an overall requirement to centralize logging,
i.e., *someone* else is logging over log4j and we're pulling that in.
 (There's also some jul logging from Jersey, etc.)

Goals:

- Fully control/capture all possible logging.  (God forbid we have to grab
System.out/err, but we'd do it if needed.)
- Use the backend we like best at the moment.  (Happens to be logback.)

Possible cases:

- If Spark used Log4j at all, we would pull in that logging via
log4j-over-slf4j.
- If Spark used only slf4j and referenced no backend, we would use it as-is
although we'd still have the log4j-over-slf4j because of other libraries.
- If Spark used only slf4j and referenced the slf4j-log4j12 backend, we
would exclude that one dependency (via our POM).

Best.
-- Paul


—
prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/


On Fri, Feb 7, 2014 at 5:38 PM, Patrick Wendell <pwendell@gmail.com> wrote:

> Hey Paul,
>
> So if your goal is ultimately to output to logback. Then why don't you
> just use slf4j and logback-classic.jar as described here [1]. Why
> involve log4j-over-slf4j at all?
>
> Let's say we refactored the spark build so it didn't advertise
> slf4j-log4j12 as a dependency. Would you still be using
> log4j-over-slf4j... or is this just a "fix" to deal with the fact that
> Spark is somewhat log4j dependent at this point.
>
> [1] http://www.slf4j.org/manual.html
>
> - Patrick
>
> On Fri, Feb 7, 2014 at 5:14 PM, Paul Brown <prb@mult.ifario.us> wrote:
> > Hi, Patrick --
> >
> > That's close but not quite it.
> >
> > The issue that occurs is not the delegation loop mentioned in slf4j
> > documentation.  The stack overflow is entirely within the code in the
> Spark
> > trait:
> >
> > at org.apache.spark.Logging$class.initializeLogging(Logging.scala:112)
> > at org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:97)
> > at org.apache.spark.Logging$class.log(Logging.scala:36)
> > at org.apache.spark.SparkEnv$.log(SparkEnv.scala:94)
> >
> >
> > And then that repeats.
> >
> > As for our situation, we exclude the slf4j-log4j12 dependency when we
> > import the Spark library (because we don't want to use log4j) and have
> > log4j-over-slf4j already in place to ensure that all of the logging in
> the
> > overall application runs through slf4j and then out through logback.  (We
> > also, as another poster already mentioned, also force jcl and jul through
> > slf4j.)
> >
> > The zen of slf4j for libraries is that the library uses the slf4j API and
> > then the enclosing application can route logging as it sees fit.  Spark
> > master CLI would log via slf4j and include the slf4j-log4j12 backend;
> same
> > for Spark worker CLI.  Spark as a library (versus as a container) would
> not
> > include any backend to the slf4j API and leave this up to the
> application.
> >  (FWIW, this would also avoid your log4j warning message.)
> >
> > But as I was saying before, I'd be happy with a situation where I can
> avoid
> > log4j being enabled or configured, and I think you'll find an existing
> > choice of logging framework to be a common scenario for those embedding
> > Spark in other systems.
> >
> > Best.
> > -- Paul
> >
> > --
> > prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
> >
> >
> > On Fri, Feb 7, 2014 at 3:01 PM, Patrick Wendell <pwendell@gmail.com>
> wrote:
> >
> >> Paul,
> >>
> >> Looking back at your problem. I think it's the one here:
> >> http://www.slf4j.org/codes.html#log4jDelegationLoop
> >>
> >> So let me just be clear what you are doing so I understand. You have
> >> some other application that directly calls log4j. So you have to
> >> include log4j-over-slf4j to route those logs through slf4j to logback.
> >>
> >> At the same time you embed Spark in this application. In the past it
> >> was fine, but now that Spark programmatic ally initializes log4j, it
> >> screws up your application because log4j-over-slf4j doesn't work with
> >> applications that do this explicilty as discussed here:
> >> http://www.slf4j.org/legacy.html
> >>
> >> Correct?
> >>
> >> - Patrick
> >>
> >> On Fri, Feb 7, 2014 at 2:02 PM, Koert Kuipers <koert@tresata.com>
> wrote:
> >> > got it. that sounds reasonable
> >> >
> >> >
> >> > On Fri, Feb 7, 2014 at 2:31 PM, Patrick Wendell <pwendell@gmail.com>
> >> wrote:
> >> >
> >> >> Koert - my suggestion was this. We let users use any slf4j backend
> >> >> they want. If we detect that they are using the log4j backend and
> >> >> *also* they didn't configure any log4j appenders, we set up some nice
> >> >> defaults for them. If they are using another backend, Spark doesn't
> >> >> try to modify the configuration at all.
> >> >>
> >> >> On Fri, Feb 7, 2014 at 11:14 AM, Koert Kuipers <koert@tresata.com>
> >> wrote:
> >> >> > well "static binding" is probably the wrong terminology but you
get
> >> the
> >> >> > idea. multiple backends are not allowed and cause an even uglier
> >> >> warning...
> >> >> >
> >> >> > see also here:
> >> >> > https://github.com/twitter/scalding/pull/636
> >> >> > and here:
> >> >> > https://groups.google.com/forum/#!topic/cascading-user/vYvnnN_15ls
> >> >> > all me being annoying and complaining about slf4j-log4j12
> dependencies
> >> >> > (which did get removed).
> >> >> >
> >> >> >
> >> >> > On Fri, Feb 7, 2014 at 2:09 PM, Koert Kuipers <koert@tresata.com>
> >> wrote:
> >> >> >
> >> >> >> the issue is that slf4j uses static binding. you can put only
one
> >> slf4j
> >> >> >> backend on the classpath, and that's what it uses. more than
one
> is
> >> not
> >> >> >> allowed.
> >> >> >>
> >> >> >> so you either keep the slf4j-log4j12 dependency for spark,
and
> then
> >> you
> >> >> >> took away people's choice of slf4j backend which is considered
bad
> >> form
> >> >> for
> >> >> >> a library, or you do not include it and then people will always
> get
> >> the
> >> >> big
> >> >> >> fat ugly warning and slf4j logging will not flow to log4j.
> >> >> >>
> >> >> >> including log4j itself is not necessary a problem i think?
> >> >> >>
> >> >> >>
> >> >> >> On Fri, Feb 7, 2014 at 1:11 PM, Patrick Wendell <
> pwendell@gmail.com
> >> >> >wrote:
> >> >> >>
> >> >> >>> This also seems relevant - but not my area of expertise
(whether
> >> this
> >> >> >>> is a valid way to check this).
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >>
> >>
> http://stackoverflow.com/questions/10505418/how-to-find-which-library-slf4j-has-bound-itself-to
> >> >> >>>
> >> >> >>> On Fri, Feb 7, 2014 at 10:08 AM, Patrick Wendell <
> >> pwendell@gmail.com>
> >> >> >>> wrote:
> >> >> >>> > Hey Guys,
> >> >> >>> >
> >> >> >>> > Thanks for explainning. Ya this is a problem - we
didn't really
> >> know
> >> >> >>> > that people are using other slf4j backends, slf4j
is in there
> for
> >> >> >>> > historical reasons but I think we may assume in a
few places
> that
> >> >> >>> > log4j is being used and we should minimize those.
> >> >> >>> >
> >> >> >>> > We should patch this and get a fix into 0.9.1. So
some
> solutions I
> >> >> see
> >> >> >>> are:
> >> >> >>> >
> >> >> >>> > (a) Add SparkConf option to disable this. I'm fine
with this
> one.
> >> >> >>> >
> >> >> >>> > (b) Ask slf4j which backend is active and only try
to enforce
> this
> >> >> >>> > default if we know slf4j is using log4j. Do either
of you know
> if
> >> >> this
> >> >> >>> > is possible? Not sure if slf4j exposes this.
> >> >> >>> >
> >> >> >>> > (c) Just remove this default stuff. We'd rather not
do this.
> The
> >> goal
> >> >> >>> > of this thing is to provide good usability for people
who have
> >> linked
> >> >> >>> > against Spark and haven't done anything to configure
logging.
> For
> >> >> >>> > beginners we try to minimize the assumptions about
what else
> they
> >> >> know
> >> >> >>> > about, and I've found log4j configuration is a huge
mental
> barrier
> >> >> for
> >> >> >>> > people who are getting started.
> >> >> >>> >
> >> >> >>> > Paul if you submit a patch doing (a) we can merge
it in. If you
> >> have
> >> >> >>> > any idea if (b) is possible I prefer that one, but
it may not
> be
> >> >> >>> > possible or might be brittle.
> >> >> >>> >
> >> >> >>> > - Patrick
> >> >> >>> >
> >> >> >>> > On Fri, Feb 7, 2014 at 6:36 AM, Koert Kuipers <
> koert@tresata.com>
> >> >> >>> wrote:
> >> >> >>> >> Totally agree with Paul: a library should not
pick the slf4j
> >> >> backend.
> >> >> >>> It
> >> >> >>> >> defeats the purpose of slf4j. That big ugly warning
is there
> to
> >> >> alert
> >> >> >>> >> people that its their responsibility to pick
the back end...
> >> >> >>> >> On Feb 7, 2014 3:55 AM, "Paul Brown" <prb@mult.ifario.us>
> wrote:
> >> >> >>> >>
> >> >> >>> >>> Hi, Patrick --
> >> >> >>> >>>
> >> >> >>> >>> From slf4j, you can either backend it into
log4j (which is
> the
> >> way
> >> >> >>> that
> >> >> >>> >>> Spark is shipped) or you can route log4j
through slf4j and
> then
> >> on
> >> >> to
> >> >> >>> a
> >> >> >>> >>> different backend (e.g., logback).  We're
doing the latter
> and
> >> >> >>> manipulating
> >> >> >>> >>> the dependencies in the build because that's
the way the
> >> enclosing
> >> >> >>> >>> application is set up.
> >> >> >>> >>>
> >> >> >>> >>> The issue with the current situation is that
there's no way
> for
> >> an
> >> >> >>> end user
> >> >> >>> >>> to choose to *not* use the log4j backend.
 (My short-term
> >> solution
> >> >> >>> was to
> >> >> >>> >>> use the Maven shade plugin to swap in a version
of the
> Logging
> >> >> trait
> >> >> >>> with
> >> >> >>> >>> the body of that method commented out.) 
In addition to the
> >> >> situation
> >> >> >>> with
> >> >> >>> >>> log4j-over-slf4j and the empty enumeration
of ROOT appenders,
> >> you
> >> >> >>> might
> >> >> >>> >>> also run afoul of someone who intentionally
configured log4j
> >> with
> >> >> an
> >> >> >>> empty
> >> >> >>> >>> set of appenders at the time that Spark is
initializing.
> >> >> >>> >>>
> >> >> >>> >>> I'd be happy with any implementation that
lets me choose my
> >> logging
> >> >> >>> >>> backend: override default behavior via system
property,
> plug-in
> >> >> >>> >>> architecture, etc.  I do think it's reasonable
to expect
> someone
> >> >> >>> digesting
> >> >> >>> >>> a substantial JDK-based system like Spark
to understand how
> to
> >> >> >>> initialize
> >> >> >>> >>> logging -- surely they're using logging of
some kind
> elsewhere
> >> in
> >> >> >>> their
> >> >> >>> >>> application -- but if you want the default
behavior there as
> a
> >> >> >>> courtesy, it
> >> >> >>> >>> might be worth putting an INFO (versus a
the glaring log4j
> WARN)
> >> >> >>> message on
> >> >> >>> >>> the output that says something like "Initialized
default
> logging
> >> >> via
> >> >> >>> Log4J;
> >> >> >>> >>> pass -Dspark.logging.loadDefaultLogger=false
to disable this
> >> >> >>> behavior." so
> >> >> >>> >>> that it's both convenient and explicit.
> >> >> >>> >>>
> >> >> >>> >>> Cheers.
> >> >> >>> >>> -- Paul
> >> >> >>> >>>
> >> >> >>> >>>
> >> >> >>> >>>
> >> >> >>> >>>
> >> >> >>> >>>
> >> >> >>> >>>
> >> >> >>> >>> --
> >> >> >>> >>> prb@mult.ifario.us | Multifarious, Inc. |
> >> http://mult.ifario.us/
> >> >> >>> >>>
> >> >> >>> >>>
> >> >> >>> >>> On Fri, Feb 7, 2014 at 12:05 AM, Patrick
Wendell <
> >> >> pwendell@gmail.com>
> >> >> >>> >>> wrote:
> >> >> >>> >>>
> >> >> >>> >>> > A config option e.g. could just be to
add:
> >> >> >>> >>> >
> >> >> >>> >>> > spark.logging.loadDefaultLogger (default
true)
> >> >> >>> >>> > If set to true, Spark will try to initialize
a log4j
> logger if
> >> >> none
> >> >> >>> is
> >> >> >>> >>> > detected. Otherwise Spark will not modify
logging behavior.
> >> >> >>> >>> >
> >> >> >>> >>> > Then users could just set this to false
if they have a
> logging
> >> >> >>> set-up
> >> >> >>> >>> > that conflicts with this.
> >> >> >>> >>> >
> >> >> >>> >>> > Maybe there is a nicer fix...
> >> >> >>> >>> >
> >> >> >>> >>> > On Fri, Feb 7, 2014 at 12:03 AM, Patrick
Wendell <
> >> >> >>> pwendell@gmail.com>
> >> >> >>> >>> > wrote:
> >> >> >>> >>> > > Hey Paul,
> >> >> >>> >>> > >
> >> >> >>> >>> > > Thanks for digging this up. I worked
on this feature and
> the
> >> >> >>> intent
> >> >> >>> >>> > > was to give users good default
behavior if they didn't
> >> include
> >> >> any
> >> >> >>> >>> > > logging configuration on the classpath.
> >> >> >>> >>> > >
> >> >> >>> >>> > > The problem with assuming that
CL tooling is going to fix
> >> the
> >> >> job
> >> >> >>> is
> >> >> >>> >>> > > that many people link against spark
as a library and run
> >> their
> >> >> >>> >>> > > application using their own scripts.
In this case the
> first
> >> >> thing
> >> >> >>> >>> > > people see when they run an application
that links
> against
> >> >> Spark
> >> >> >>> was a
> >> >> >>> >>> > > big ugly logging warning.
> >> >> >>> >>> > >
> >> >> >>> >>> > > I'm not super familiar with log4j-over-slf4j,
but this
> >> >> behavior of
> >> >> >>> >>> > > returning null for the appenders
seems a little weird.
> What
> >> is
> >> >> >>> the use
> >> >> >>> >>> > > case for using this and not just
directly use
> slf4j-log4j12
> >> >> like
> >> >> >>> Spark
> >> >> >>> >>> > > itself does?
> >> >> >>> >>> > >
> >> >> >>> >>> > > Did you have a more general fix
for this in mind? Or was
> >> your
> >> >> >>> plan to
> >> >> >>> >>> > > just revert the existing behavior...
We might be able to
> >> add a
> >> >> >>> >>> > > configuration option to disable
this logging default
> stuff.
> >> Or
> >> >> we
> >> >> >>> >>> > > could just rip it out - but I'd
like to avoid that if
> >> possible.
> >> >> >>> >>> > >
> >> >> >>> >>> > > - Patrick
> >> >> >>> >>> > >
> >> >> >>> >>> > > On Thu, Feb 6, 2014 at 11:41 PM,
Paul Brown <
> >> >> prb@mult.ifario.us>
> >> >> >>> >>> wrote:
> >> >> >>> >>> > >> We have a few applications
that embed Spark, and in
> 0.8.0
> >> and
> >> >> >>> 0.8.1,
> >> >> >>> >>> we
> >> >> >>> >>> > >> were able to use slf4j, but
0.9.0 broke that and
> >> >> unintentionally
> >> >> >>> >>> forces
> >> >> >>> >>> > >> direct use of log4j as the
logging backend.
> >> >> >>> >>> > >>
> >> >> >>> >>> > >> The issue is here in the org.apache.spark.Logging
trait:
> >> >> >>> >>> > >>
> >> >> >>> >>> > >>
> >> >> >>> >>> >
> >> >> >>> >>>
> >> >> >>>
> >> >>
> >>
> https://github.com/apache/incubator-spark/blame/master/core/src/main/scala/org/apache/spark/Logging.scala#L107
> >> >> >>> >>> > >>
> >> >> >>> >>> > >> log4j-over-slf4j *always* returns
an empty enumeration
> for
> >> >> >>> appenders
> >> >> >>> >>> to
> >> >> >>> >>> > the
> >> >> >>> >>> > >> ROOT logger:
> >> >> >>> >>> > >>
> >> >> >>> >>> > >>
> >> >> >>> >>> >
> >> >> >>> >>>
> >> >> >>>
> >> >>
> >>
> https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/Category.java?source=c#L81
> >> >> >>> >>> > >>
> >> >> >>> >>> > >> And this causes an infinite
loop and an eventual stack
> >> >> overflow.
> >> >> >>> >>> > >>
> >> >> >>> >>> > >> I'm happy to submit a Jira
and a patch, but it would be
> >> >> >>> significant
> >> >> >>> >>> > enough
> >> >> >>> >>> > >> reversal of recent changes
that it's probably worth
> >> discussing
> >> >> >>> before
> >> >> >>> >>> I
> >> >> >>> >>> > >> sink a half hour into it. 
My suggestion would be that
> >> >> >>> initialization
> >> >> >>> >>> > (or
> >> >> >>> >>> > >> not) should be left to the
user with reasonable default
> >> >> behavior
> >> >> >>> >>> > supplied
> >> >> >>> >>> > >> by the spark commandline tooling
and not forced on
> >> >> applications
> >> >> >>> that
> >> >> >>> >>> > >> incorporate Spark.
> >> >> >>> >>> > >>
> >> >> >>> >>> > >> Thoughts/opinions?
> >> >> >>> >>> > >>
> >> >> >>> >>> > >> -- Paul
> >> >> >>> >>> > >> --
> >> >> >>> >>> > >> prb@mult.ifario.us | Multifarious,
Inc. |
> >> >> http://mult.ifario.us/
> >> >> >>> >>> >
> >> >> >>> >>>
> >> >> >>>
> >> >> >>
> >> >> >>
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message