hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kun Ling <lkun.e...@gmail.com>
Subject Re: How is sharing done in HDFS ?
Date Wed, 22 May 2013 09:41:09 GMT
Hi Agarwal,
    Thanks to Harsh J's reply. I have found the following code( based on
hadoop-1.0.4)  that may give you some help:

   localizedJobTokenFile() in TaskTracker.java: which localize a file named
 "JobToken" .
   localizeJobConfFile() in TaskTracker.java: which localize a file named
   And also some Distributed Cache files will also be localized by calling
the function: taskDistributedCacheManager.setupCache().

   all the above function is called in the initializeJob() method of

And the JobToken file is copied from the directory from
jobClient.getSystemDir(), which is initialized as an shared directory in
HDFS  in offerService() of TaskTracker.java.

  To Harsh:   While after looking into the sourcecode( based on
hadoop-1.0.4), I have the following questions:
    1. Where is the Job.xml stored in the shared HDFS, while looking into
 the code, I only found the readFields(DataInput in) method of class Task
in Task.java. And the only statement is " jobFile = Text.readString(in)"

   2. There is also a _partition.lst file, and also job.jar file, which is
also shared by all the Tasks, While I do not find any code corresponding to
localize this file, Do you know what code in which file makes partition.lst
localization happen?

   3. Is there any file that need to share, besides  JobToken, Job.xml,
distributed cache files, _partition.lst, job.jar file?

   4. all the observation is based on Hadoop 1.0.4 source code. Any update
of the latest hadoop-2.0-alpha, and the Hadoop-trunk?

On Wed, May 22, 2013 at 4:45 PM, Harsh J <harsh@cloudera.com> wrote:

> The job-specific files, placed by the client, are downloaded individually
> by every tasktracker from the HDFS (The process is called "localization" of
> the task before it starts up) and then used.
> On Wed, May 22, 2013 at 1:59 PM, Agarwal, Nikhil <
> Nikhil.Agarwal@netapp.com> wrote:
>>  Hi,****
>> ** **
>> Can anyone guide me to some pointers or explain how HDFS shares the
>> information put in the temporary directories (hadoop.tmp.dir,
>> mapred.tmp.dir, etc.) to all other nodes? ****
>> ** **
>> I suppose that during execution of a MapReduce job, the JobTracker
>> prepares a file called jobtoken and puts it in the temporary directories.
>> which needs to be read by all TaskTrackers. So, how does HDFS share the
>> contents? Does it use nfs mount or ….?****
>> ** **
>> Thanks & Regards,****
>> Nikhil****
>> ** **
> --
> Harsh J


View raw message