hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1553) Extensive logging of C++ application can slow down task by an order of magnitude
Date Wed, 01 Aug 2007 01:41:53 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Owen O'Malley updated HADOOP-1553:

    Attachment: new-log.patch

This patch fixes the performance problems with user task logging. Before the patch, running
the word count example on a given input (Alice in Wonderland *smile*) would take 6 seconds
normally and minutes if the program printed to stdout. After the patch, it takes 4 seconds
with no stdout and 6 seconds with printing.

This patch includes several incompatible changes:
  1. The user logs are no longer stored in segments, but rather complete files.
  2. All tasks are launched via bash to get input redirection.
  3. The cap on user logs has been turned off by default. It is still available, but makes
the command used to launch tasks much more complicated.
  4. The entire length of the user log cap is stored in memory now rather than disk. Thus,
setting the cap to a large value may cause problems.
  5. The task logger has fewer configuration knobs that have been removed from the log4j.properties.
  6. The urls to access the task logs from the task tracker have changed. The new urls only
have start and end offsets, but the offsets may be either positive from the start of the file
or negative from the end of the file. 
  7. The jsp has been replaced by a servlet, so that the bytes don't need to be interpreted
as a string.
  8. The servlet does not buffer the entire log into memory before it sent to the user.
  9. The TaskLog class is now public so that pipes can use it.

> Extensive logging of C++ application can slow down task by an order of magnitude
> --------------------------------------------------------------------------------
>                 Key: HADOOP-1553
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1553
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Christian Kunz
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.14.0
>         Attachments: new-log.patch
> We observed that extensive logging (due to some configuration mistake) of a c++ application
using the pipes interface can slow down the task by an order of magnitude. During that time
disk usage was not high, with no abnormal memory usage, and basically idle CPU.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message