hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3) Output directories are not cleaned up before the reduces run
Date Wed, 22 Mar 2006 17:41:14 GMT
     [ http://issues.apache.org/jira/browse/HADOOP-3?page=all ]

Owen O'Malley updated HADOOP-3:

    Attachment: noclobber.patch

Ok, this patch ensures that the output directory is set and does not exist.
If the application wants to clobber old data, they need to delete the files themselves.
I added the check for the output directory being set, because otherwise the job doesn't fail
the reduces try to run. With the added check, they fail before they are submitted. 

I wasn't sure we wanted to support the no reduces case, but it was pretty easy to handle here
not requiring an output directory.

> Output directories are not cleaned up before the reduces run
> ------------------------------------------------------------
>          Key: HADOOP-3
>          URL: http://issues.apache.org/jira/browse/HADOOP-3
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>     Priority: Minor
>      Fix For: 0.1
>  Attachments: clean-out-dir.patch, noclobber.patch
> The output directory for the reduces is not cleaned up and therefore if you can see left
overs from previous runs, if they had more reduces. For example, if you run the application
once with reduces=10 and then rerun with reduces=8, your output directory will have frag00000
to frag00009 with the first 8 fragments from the second run and the last 2 fragments from
the first run.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message