incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <>
Subject Re: Logging/Debugging
Date Mon, 24 Sep 2012 19:01:34 GMT
On Mon, Sep 24, 2012 at 11:09 AM, Matthias Friedrich <> wrote:
> On Sunday, 2012-09-23, Josh Wills wrote:
>> On Sun, Sep 23, 2012 at 2:11 AM, Matthias Friedrich <> wrote:
>>> On Saturday, 2012-09-22, Josh Wills wrote:
>>> [...]
>> Ah, okay. So what we want for debugging is the Hadoop WARN logs. When
>> a hadoop job fails on the cluster, we have those logs available on the
>> JobTracker webpage (at least, I do in CDH, I assume it works the same
>> way in Hadoop 1.0.3), so enableDebug doesn't do anything for us
>> (besides altering the Configuration to force Crunch to put try-catch
>> blocks around the DoNode tasks, which I assume still works fine). I
>> use enableDebug to force the logging of Hadoop WARN statements on my
>> machine when I'm testing out pipelines, so in that case, it's only
>> effecting LocalJobRunner.
> Yep. I think we could remove, the log4j setup code in
> enableDebug(), and the log4j dependency from Crunch and the behavior
> on the cluster should still be the same. The same holds for
> LocalJobRunner when running via "hadoop jar".
> Running the LocalJobRunner from the IDE is the problem because then
> we need a logging backend on the classpath. If we don't have log4j,
> then java.util.logging is used, which logs everything on INFO level.
> As soon as log4j is on the classpath, however, the user really needs
> a or log4j will complain that it doesn't have
> configuration (and logs nothing).
>> Given that, what's the best approach here? Javadoc statement on the
>> function indicating its intended use, or is there a better option?
> I'd say let's remove from Crunch, because users
> can't defend themselves against it. We have local applications at
> work that run some parts locally, without anything Hadoop-specific;
> shipping a with Crunch would cause problems for us.
> We could then add a to src/main/resources in the
> archetype with an explanation of when exactly this configuration is
> used (only when running from the IDE). We would keep enableDebug()
> with its setting of "crunch.debug", but remove the log4j code, and add
> a "provided" log4j dependency to the archetype (because log4j is
> missing from hadoop-core).
> Does this make sense? Will this give you the logging/debugging output
> that you need?

I'm on board with that plan. My one tweak would be to add support for
hacking log4j to turn on Hadoop's WARN
logs into the crunch-test functionality, which I think will serve my
needs from within the IDE and won't interfere with
any production or client log settings. Does that meet your goals as well?

> [...]
>>> Ah, that reminds me: We haven't decided yet if we want an archetype in
>>> Crunch.
>> I want one. I thought you created it? I remember seeing an email-- if
>> I didn't reply, it was b/c I was in the midst of that crazy travel
>> week and my sleep schedule was off (honestly, I'm just now
>> recovering.)
> No worries, I'm a bit sleep-deprived myself so I can relate. With
> Gabriel we're +3 pro archetype, so I'll make a patch this weekend.
> Regards,
>   Matthias

Director of Data Science
Twitter: @josh_wills

View raw message