hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "TaskExecutionEnvironment" by AmareshwariSriRamadasu
Date Thu, 27 Mar 2008 06:05:24 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by AmareshwariSriRamadasu:
http://wiki.apache.org/hadoop/TaskExecutionEnvironment

------------------------------------------------------------------------------
  
  All of the directories are relative to the ''<local>'' directory set in the !TaskTracker's
configuration. The !TaskTracker can define multiple local directories and each filename is
assigned to a semi-random local directory.
  
+ There are two directories. The first is ''<local>''/taskTracker/archive. This directory
holds the localized distributed cache. Thus localized distributed cache is shared among all
the tasks and jobs. The second is job specific directory ''<local>''/taskTracker/jobcache/''<jobId>''.
The job directory has the following structure.
- There are two directories. The first is ''<local>''/taskTracker/jobcache/''<jobId>''/''<taskId>'',
the second is ''<local>''/''<taskId>''. I'm not sure why there are two directories,
but different stuff goes in them.  The first contains the job.xml. The job.xml is the serialization
of the job's !JobConf after it has been ''localized'' for that task. Task localization means
that properties have been set that are specific to this particular task within the job. The
second directory contains the temporary map reduce data generated by the framework. The job.jar
is contained in  ''<local>''/taskTracker/jobcache/''<jobId>''/. The job.jar is
the application's jar file that is automatically distributed to each machine.
- == Work Directory ==
  
- The current working directory for the task is ''<local>''/taskTracker/jobcache/''<jobId>''/work.
The job.jar is expanded in this directory before the tasks for the job start.
+   * ''<local>''/taskTracker/jobcache/''<jobId>''/                         --
The job directory
+          *                                       work                     -- The scratch
space
+          *                                       jars                     -- expanded jar
 
+          *                                       job.xml                  -- The generic
job conf.
+          *                                      ''<taskid>''/             -- The task
directory
+                 *                                           job.xml     -- Task localized
job conf 
+                 *                                           output      -- intermediate
map output files
+                 *                                           work        -- cwd of the task
  
- The localized job.xml is located in  ''<local>''/taskTracker/jobcache/''<jobId>''/''<taskId>''/.
+ The job directory contains the ''job.xml'', ''jars'', ''work'' and ''<taskid>'' directories.
The ''job.xml'' is the serialization of the job's !JobConf after it has been ''localized''
for that task. The job.jar is contained in  ''<local>''/taskTracker/jobcache/''<jobId>''/jars/.
The job.jar is the application's jar file that is automatically distributed to each machine.
It is expanded in ''jars'' directory before the tasks for the job start. The job.jar location
is accessible to the application through the api [http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobConf.html#getJar()
JobConf.getJar()]. To access the unjarred directory, JobConf.getJar().getParent() can be called.
+ 
+ The ''work'' directory in job directory is the job-specific shared directory. The tasks
can use this space as scratch space and share files among them. This directory is exposed
to the users through ''job.local.dir''. The directory can accessed through api [http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobConf.html#getJobLocalDir()
JobConf.getJobLocalDir()]. It is available as System property also. So,users can call ''System.getProperty("job.local.dir")''.
+ 
+ The task directory in job directory contains ''job.xml'', ''output'' and ''work'' directories.
The ''job.xml'' is the JobConf localized for the task. Task localization means that properties
have been set that are specific to this particular task within the job. The ''output'' directory
contains the temporary map reduce data generated by the framework such as map output files
etc. The ''work'' directory in task directory is the working directory of the child process.
The work directory has a ''tmp'' directory to create temporary files, if ''mapred.child.tmp''
has the value ''./tmp''. 
  
  == Processes ==
  
@@ -42, +52 @@

  
  || '''Name''' || '''Type''' || '''Description''' ||
  || mapred.job.id || String || The job id ||
+ || mapred.jar || String || job.jar location in job directory ||
+ || job.local.dir || String  || The job specific shared scratch space ||
  || mapred.task.id || String || The task id ||
  || mapred.task.is.map || boolean || Is this a map task ||
  || mapred.task.partition || int || The id of the task within the job ||
  || map.input.file || String || The filename that the map is reading from ||
  || map.input.start || long || The offset of the start of the map input split ||
  || map.input.length || long || The number of bytes in the map input split ||
+ || mapred.work.output.dir || String || The task's temporary output directory ||
  

Mime
View raw message