incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Friedrich <m...@mafr.de>
Subject Re: Logging/Debugging
Date Mon, 24 Sep 2012 18:09:43 GMT
On Sunday, 2012-09-23, Josh Wills wrote:
> On Sun, Sep 23, 2012 at 2:11 AM, Matthias Friedrich <matt@mafr.de> wrote:
>> On Saturday, 2012-09-22, Josh Wills wrote:
>> [...]
 
> Ah, okay. So what we want for debugging is the Hadoop WARN logs. When
> a hadoop job fails on the cluster, we have those logs available on the
> JobTracker webpage (at least, I do in CDH, I assume it works the same
> way in Hadoop 1.0.3), so enableDebug doesn't do anything for us
> (besides altering the Configuration to force Crunch to put try-catch
> blocks around the DoNode tasks, which I assume still works fine). I
> use enableDebug to force the logging of Hadoop WARN statements on my
> machine when I'm testing out pipelines, so in that case, it's only
> effecting LocalJobRunner.

Yep. I think we could remove log4j.properties, the log4j setup code in
enableDebug(), and the log4j dependency from Crunch and the behavior
on the cluster should still be the same. The same holds for
LocalJobRunner when running via "hadoop jar".

Running the LocalJobRunner from the IDE is the problem because then
we need a logging backend on the classpath. If we don't have log4j,
then java.util.logging is used, which logs everything on INFO level.
As soon as log4j is on the classpath, however, the user really needs
a log4j.properties or log4j will complain that it doesn't have
configuration (and logs nothing).

> Given that, what's the best approach here? Javadoc statement on the
> function indicating its intended use, or is there a better option?

I'd say let's remove log4j.properties from Crunch, because users
can't defend themselves against it. We have local applications at
work that run some parts locally, without anything Hadoop-specific;
shipping a log4j.properties with Crunch would cause problems for us.
 
We could then add a log4j.properties to src/main/resources in the
archetype with an explanation of when exactly this configuration is
used (only when running from the IDE). We would keep enableDebug()
with its setting of "crunch.debug", but remove the log4j code, and add
a "provided" log4j dependency to the archetype (because log4j is
missing from hadoop-core).

Does this make sense? Will this give you the logging/debugging output
that you need?

[...] 
>> Ah, that reminds me: We haven't decided yet if we want an archetype in
>> Crunch.
 
> I want one. I thought you created it? I remember seeing an email-- if
> I didn't reply, it was b/c I was in the midst of that crazy travel
> week and my sleep schedule was off (honestly, I'm just now
> recovering.)
 
No worries, I'm a bit sleep-deprived myself so I can relate. With
Gabriel we're +3 pro archetype, so I'll make a patch this weekend.
 
Regards,
  Matthias

Mime
View raw message