hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur" <tuc...@gmail.com>
Subject Re: [jira] Updated: (HADOOP-1558) changes to OutputFormat to work on temporary directory to enable re-running crashed jobs (Issue: 1121)
Date Wed, 11 Jul 2007 00:40:29 GMT
Suggestions make sense.

I was looking at the Task class and it seems too Map/Reduce Task
specific so I'll need some help here.

It is you intention to run the initialize/commit Tasks in the JT box
or it they should run in the slaves?

Thxs.

A

On 7/11/07, Doug Cutting (JIRA) <jira@apache.org> wrote:
>
>      [ https://issues.apache.org/jira/browse/HADOOP-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
>
> Doug Cutting updated HADOOP-1558:
> ---------------------------------
>
>     Fix Version/s:     (was: 0.14.0)
>            Status: Open  (was: Patch Available)
>
> This is a good feature, but it's going to be more complicated to implement.  We only
instantiate user classes in task and client jvms, never in jobtracker or tasktracker jvms.
 So initialize() and commit() need to be run as tasks: InitializeTask and CommitTask.  Adding
new task classes should be easy in principle, but it might not be in practice.  Also, getUncommittedOutputDirectory()
is specific to file-based output formats and so does not belong in the OutputFormat interface,
but rather on a base class for file-based outputs.  We should probably rename OutputFormatBase
to be FileOutputFormat, just as we renamed InputFormatBase to be FileInputFormat.
>
> > changes to OutputFormat to work on temporary directory to enable re-running crashed
jobs (Issue: 1121)
> > ------------------------------------------------------------------------------------------------------
> >
> >                 Key: HADOOP-1558
> >                 URL: https://issues.apache.org/jira/browse/HADOOP-1558
> >             Project: Hadoop
> >          Issue Type: Improvement
> >          Components: mapred
> >         Environment: all
> >            Reporter: Alejandro Abdelnur
> >         Attachments: hadoop-1558-JUN1007-1934.txt
> >
> >
> > Add  OutputFormat methods like:
> > /** Called to initialize output for this job. */
> > void initialize(JobConf job) throws IOException;
> > /** Called to finalize output for this job. */
> > void commit(JobConf job) throws IOException;
> > In the base implemenation for FileSystem output, initialize() might then create
a temporary directory for the job, removing any that already exists, and commit could rename
the temporary output directory to the final name.
> > The existing checkOutputSpecs() would continue to throw an exception if the final
output already exists.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Mime
View raw message