hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "TaskExecutionEnvironment" by OwenOMalley
Date Wed, 19 Jul 2006 22:08:51 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by OwenOMalley:

New page:
= Hadoop Map/Reduce Task Execution Environment =

Hadoop Map/Reduce tasks (the generic term for maps or reduces) run distributed across a cluster
and most tasks don't care about their environment, because they only use the ''standard''
inputs and outputs from the API, but some tasks do care and this page documents the details.

== Directories ==

All of the directories are relative to the ''<local>'' directory set in the !TaskTracker's
configuration. The !TaskTracker can define multiple local directories and each filename is
assigned to a semi-random local directory.

There are two task directories. The first is ''<local>''/taskTracker/''<taskId>''
and the other is ''<local>''/''<taskId>''. I'm not sure why there are two directories,
but different stuff goes in them.  The first contains the job.jar, job.xml, and the work directory.
The job.jar is the application's jar file that is automatically distributed to each machine.
The job.xml is the serialization of the job's !JobConf after it has been ''localized'' for
that task. Task localization means that properties have been set that are specific to this
particular task within the job. 

== Work Directory ==

The current working directory for the task is ''<local>''/taskTracker/''<taskId>''/work.
The job.jar is expanded in this directory before the task starts.

The localized job.xml and job.jar are both located in ''<local>''/taskTracker/''<taskId>''/.

== Processes ==

The task is in its own Java virtual machine that forks from the !TaskTracker. The !TaskTracker
waits for the child process to finish and logs the event if a non-zero exit code is returned.

The task's class path is set to the server's class path followed by all of the jars in the
lib directory from the expanded job.jar followed by the expanded job.jar itself.

== Outputs ==

=== Output Streams ===

The standard output (stdout) and error (stderr) streams are read by the !TaskTracker and logged
to its log at the INFO level under the org.apache.hadoop.mapred.!TaskRunner logger.

=== Filenames ===

Map tasks put their outputs into ''<local>''/''<taskId>''/part-''<reduce>''.out.

Reduce tasks read their inputs from ''<local>''/''<taskId>''/map_''<map>''.out.

== Localized Properties in the JobConf ==

The following properties are localized for each task's !JobConf:

|| '''Name''' || '''Type''' || '''Description''' ||
|| mapred.job.id || String || The job id ||
|| mapred.task.id || String || The task id ||
|| mapred.task.is.map || boolean || Is this a map task ||
|| mapred.task.partition || int || The id of the task within the job ||
|| map.input.file || String || The filename that the map is reading from ||
|| map.input.start || long || The offset of the start of the map input split ||
|| map.input.length || long || The number of bytes in the map input split ||

View raw message