hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Costa <psdc1...@gmail.com>
Subject Where the map task uses the set of locations?
Date Wed, 11 May 2011 16:52:51 GMT
Hi,

I was looking to the mapred code, searching for the moment where the
split location is passed to the MapTask, and I've found this line in
TaskInProgress class.
[code]
t = new MapTask(jobFile, taskid, partition, splitClass, split,
rawSplit.getFileName(), rawSplit.getLocations());
[/code]

The split variable is the split.

[code]
	BytesWritable split;
			if (!jobSetup && !jobCleanup) {
				splitClass = rawSplit.getClassName();
				split = rawSplit.getBytes();
			} else {
				split = new BytesWritable();
			}
[/code]

The "rawSplit.getFileName()" is the full URL to the split file
(hdfs://chicon-7.fr:54310/user/xxx/gutenberg/A.txt), the locations are
the servers where the split is ([chicon-7.fr, chinqchint-21.fr,
chinqchint-38.fr]).


1 - Why during the creation of a MapTask is passed the split and the
filename and the set of locations? If the split is passed, I deduce
that the map task already contains the split bytes, that it will use.
So, why not just pass the split, and ignore the the filename and the
set of locations?



Thanks

-- 
---------------------------
PSC

Mime
View raw message