incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <>
Subject Re: Logging/Debugging
Date Sun, 23 Sep 2012 16:39:57 GMT

On Sun, Sep 23, 2012 at 2:11 AM, Matthias Friedrich <> wrote:
> On Saturday, 2012-09-22, Josh Wills wrote:
> [...]
>> Perhaps I'm being dense-- once we remove from the
>> core, won't we stop logging the Crunch status information unless the
>> developer explicitly configures it in their own
>> That may of course be what the developer desires, but my thought was
>> that we typically do want that information logged, and that
>> repeating it in every file for every project would
>> be tedious.
> At second glance it seems there are a few misconceptions on how
> logging works with Hadoop. When running from the IDE with
> LocalJobRunner, has no effect because there's no
> log4j on the classpath; commons-logging just uses java.util.logging
> which logs on INFO level. Users who have log4j on the classpath will
> see different results, but they will have to add it themselves, as
> neither hadoop-client nor hadoop-core will provide it for them.
> When running jobs using "hadoop jar", our doesn't
> have an effect either because Hadoop's conf/ takes
> precedence (even with HADOOP_USER_CLASSPATH_FIRST). This surprises
> me, to be honest, but Crunch's INFO messages are still printed, which
> is good. What also surprises me is that the log4j setup code in
> enableDebug() doesn't seem to have any effect; AFAICS, Hadoop INFO
> messages already appear on the console. All I get when calling
> enableDebug() is a log message "Could not find console appender 'A'".
> Is there something broken? What's the intention behind the log4j code
> in enableDebug()? Unless I'm overlooking something (quite possible)
> this seems like a no-op.

Ah, okay. So what we want for debugging is the Hadoop WARN logs. When
a hadoop job fails on the cluster, we have those logs available on the
JobTracker webpage (at least, I do in CDH, I assume it works the same
way in Hadoop 1.0.3), so enableDebug doesn't do anything for us
(besides altering the Configuration to force Crunch to put try-catch
blocks around the DoNode tasks, which I assume still works fine). I
use enableDebug to force the logging of Hadoop WARN statements on my
machine when I'm testing out pipelines, so in that case, it's only
effecting LocalJobRunner.

Given that, what's the best approach here? Javadoc statement on the
function indicating its intended use, or is there a better option?

>> Assuming that is the case, perhaps we could add a default
>> file to the maven archetype that generates Crunch
>> projects?
> Ah, that reminds me: We haven't decided yet if we want an archetype in
> Crunch.

I want one. I thought you created it? I remember seeing an email-- if
I didn't reply, it was b/c I was in the midst of that crazy travel
week and my sleep schedule was off (honestly, I'm just now

> Regards,
>   Matthias

Director of Data Science
Twitter: @josh_wills

View raw message