hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Hunt (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-12) Please add timestamps to pig map/reduce progress messages
Date Mon, 03 Dec 2007 23:00:48 GMT

    [ https://issues.apache.org/jira/browse/PIG-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548034

Patrick Hunt commented on PIG-12:

I looked into the latest comments and I don't think you want to do this, however:

Adding timestamps to pig output is simple, change line 160 to this:

        String logfmt = "%d [%t] %-5p %c - %m%n";
        ConsoleAppender screen = new ConsoleAppender(new PatternLayout(logfmt));

all PIG (notice PIG) output will now have timestamps. HOWEVER all hadoop output will have
the std output format (no timestamp). I don't think this is what you/we want. This is happening
because pig is using it's own appender, rather than the root appender (See PigContext.java,
line 113). Hadoop seems to be using the root.

log4j architecture is complex and rooted around extreme flexibility. Typically you configure
things like the timestamp through configuration, not programmatically. The issue I see is
that we have no easy way to configure the root logger "in the code", rather we are at the
mercy of the user/hadoop configuration. We can't assume a particular root logging scheme in
our code.

I don't think you want to do this. I'm no expert at log4j but my understanding is that what
you are suggesting won't work unless we make certain assumptions which we then bake into the
code. Primarily we would need to override the root logger with our own log configuration (down
at the root, currently we only do this at org.apache.pig level). This may be a viable option,
you would need to think about the effects...

> Please add timestamps to pig map/reduce progress messages
> ---------------------------------------------------------
>                 Key: PIG-12
>                 URL: https://issues.apache.org/jira/browse/PIG-12
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Olga Natkovich
> From one of the users: 
> ------------------------------
> I'm spending a lot of time trying to optimize my pig queries for short
> run-times.  This process would be much easier if, in the progress output
> from pig (currently on stdout, but hopefully soon moving to  
> stderr?!), the
> initiation and completion of each map/reduce job could be  
> timestamped.  Pig
> already spits out messages of the form "----- MapReduce Job -----",  
> "Input:
> ...", "Combine: ...", etc; could you just add a "Timestamp: ..."
> field as well?	Or ideally, both "Starting timestamp: ..." and	
> "Finishing
> timestamp ...".
> Additional comments from another user:
> ------------------------------------------------------
> I'm adding my vote for this as well.
> I'd like to know timestamp and "running time" in seconds or D;H:M:S:
> Thu Oct 25 10:06:01 GMT 2007 (0:00:12:56): 56% done
> Starting and stopping timestamps in the log would also be valuable.
> Unforutately, there's no "workaround" such as putting a date command before and after
the pig command in logging --
> queuing times can be seconds to hours and completely mess up any notion of job execution

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message