incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Logging/Debugging
Date Sun, 23 Sep 2012 16:39:57 GMT
Inlined.

On Sun, Sep 23, 2012 at 2:11 AM, Matthias Friedrich <matt@mafr.de> wrote:
> On Saturday, 2012-09-22, Josh Wills wrote:
> [...]
>> Perhaps I'm being dense-- once we remove log4j.properties from the
>> core, won't we stop logging the Crunch status information unless the
>> developer explicitly configures it in their own log4j.properties?
>> That may of course be what the developer desires, but my thought was
>> that we typically do want that information logged, and that
>> repeating it in every log4j.properties file for every project would
>> be tedious.
>
> At second glance it seems there are a few misconceptions on how
> logging works with Hadoop. When running from the IDE with
> LocalJobRunner, log4j.properties has no effect because there's no
> log4j on the classpath; commons-logging just uses java.util.logging
> which logs on INFO level. Users who have log4j on the classpath will
> see different results, but they will have to add it themselves, as
> neither hadoop-client nor hadoop-core will provide it for them.
>
> When running jobs using "hadoop jar", our log4j.properties doesn't
> have an effect either because Hadoop's conf/log4j.properties takes
> precedence (even with HADOOP_USER_CLASSPATH_FIRST). This surprises
> me, to be honest, but Crunch's INFO messages are still printed, which
> is good. What also surprises me is that the log4j setup code in
> enableDebug() doesn't seem to have any effect; AFAICS, Hadoop INFO
> messages already appear on the console. All I get when calling
> enableDebug() is a log message "Could not find console appender 'A'".
>
> Is there something broken? What's the intention behind the log4j code
> in enableDebug()? Unless I'm overlooking something (quite possible)
> this seems like a no-op.

Ah, okay. So what we want for debugging is the Hadoop WARN logs. When
a hadoop job fails on the cluster, we have those logs available on the
JobTracker webpage (at least, I do in CDH, I assume it works the same
way in Hadoop 1.0.3), so enableDebug doesn't do anything for us
(besides altering the Configuration to force Crunch to put try-catch
blocks around the DoNode tasks, which I assume still works fine). I
use enableDebug to force the logging of Hadoop WARN statements on my
machine when I'm testing out pipelines, so in that case, it's only
effecting LocalJobRunner.

Given that, what's the best approach here? Javadoc statement on the
function indicating its intended use, or is there a better option?

>
>> Assuming that is the case, perhaps we could add a default
>> log4j.properties file to the maven archetype that generates Crunch
>> projects?
>
> Ah, that reminds me: We haven't decided yet if we want an archetype in
> Crunch.

I want one. I thought you created it? I remember seeing an email-- if
I didn't reply, it was b/c I was in the midst of that crazy travel
week and my sleep schedule was off (honestly, I'm just now
recovering.)

>
> Regards,
>   Matthias



-- 
Director of Data Science
Cloudera
Twitter: @josh_wills

Mime
View raw message