hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-2048) DISTCP mapper should report progress more often
Date Mon, 22 Oct 2007 21:26:51 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Owen O'Malley updated HADOOP-2048:

    Status: Open  (was: Patch Available)

A couple of issues:
  1. Please use better/longer variable names.
  2. The failures shouldn't be stored, but always logged at the INFO level.
  3. I'd change the bfailed flag to failureCount and have the final exception record the number
of failures.
  4. Don't bother doing a time limit on the status reporting. The framework already limits
it down to once a second.
  5. Just use the status message to record # bytes copied, # files copied, # failures, since
particular failures will be overwritten too quickly. You just want the user to know that there
is something to look at in the logs.

> DISTCP mapper should report progress more often
> -----------------------------------------------
>                 Key: HADOOP-2048
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2048
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: Runping Qi
>            Assignee: Chris Douglas
>            Priority: Blocker
>             Fix For: 0.15.0
>         Attachments: 2048-2.patch, 2048-3.patch, 2048.patch
> When I ran DISTCP to copy files from one dfs to another, I noticed that some mappers
got killed due to failing to report status for 606 seconds. 
> I noticed that the mappers try to make a progress report for every 32MB copied. A better
way to ensure progress is to use a time interval since last report.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message