hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-933) Application defined InputSplits do not work
Date Fri, 09 Feb 2007 21:55:06 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Owen O'Malley updated HADOOP-933:

    Attachment: client-split.patch

This patch fixes this bug and HADOOP-867.

The JobClient creates an InputFormat object and generates the list of InputSplits using getSplits.
The list of InputSplits are written to a dfs file next to the job.xml. (More precisely, a
list of RawInputSplits are written to a file. The RawInputSplits consist of the serialized
InputSplit, the class name, and the list of locations. When the JobTracker initializes the
JobInProgress, it just has to read the serialized InputSplits and passes them down to the
TaskTrackers and to the Task. When the MapTask starts, it deserializes the InputSplit and
uses it to create the RecordReader. This has the advantage that non-FileSplit InputSplits
work (since the class is recorded) and that the user code is never loaded by the JobTracker.

> Application defined InputSplits do not work
> -------------------------------------------
>                 Key: HADOOP-933
>                 URL: https://issues.apache.org/jira/browse/HADOOP-933
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Benjamin Reed
>         Assigned To: Owen O'Malley
>         Attachments: client-split.patch, JobInProgress.patch, MapTask.patch
> If an application defines its own InputSplit, the task tracker chokes when it cannot
deserialize the InputSplit when it deserializes MapTasks it receives from the JobTracker.
This is because the TaskTracker does not resolve classes from the job jar file. The attached
patch delays resolution of the InputSplit until it is running in the context of the child
process where it can resolve the InputSplit class.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message