hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Shook <ash...@clearedgeit.com>
Subject Unusual large number of map tasks for a SequenceFile
Date Mon, 01 Aug 2011 21:19:07 GMT
Hi All,

I am writing a sequence file to HDFS from an application as a pre-process to a MapReduce job.
 (It isn't being written from a MR job, just open, write, close)

The file is around 32 MBs in size.  When the MapReduce job starts up, it starts with 256 map
tasks.  I am writing SequenceFiles from this first job and firing up a second with the first
job's output.  The second job has around 32KB of input with 138 map tasks.  There are 128
part files, so it should only be 128 map tasks for this second job.  This seems to be an unusually
large amount of map tasks since the cluster is configured to the default block size of 64MB.
 I am using Hadoop v0.20.1.

Is there something special about how the SequenceFiles are being written?  As far as how I
am using to write the first file, below is a code sample.

Thanks,
Adam


FileSystem fs = FileSystem.get(new Configuration());
Writer wrtr = SequenceFile.createWriter(fs, fs.getConf(), <path_to_file>, Text.class,
Text.class);

for (String s1 : strings1) {
      for (String s2 : strings2) {
wrtr.append((new Text(s1), new Text(s2));
}
}

wrtr.close();

Mime
View raw message