hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Runping Qi" <runp...@yahoo-inc.com>
Subject RE: Setting number of Maps
Date Tue, 03 Jul 2007 17:54:39 GMT

Seem your thinking is on the right track.
You can use one map/reduce job to split your input file containing the
complex numbers into desired number of files. This should be easy to do.
Then you can run your main job on the split files which will offer you
desired parallelism.

One thing to keep in mind is that your job needs to make sure to report
progress frequently enough during the 10 minutes processing, so that the job
tracker will not kill your job due to "timeout".


> -----Original Message-----
> From: Oliver Haggarty [mailto:ojh06@doc.ic.ac.uk]
> Sent: Tuesday, July 03, 2007 8:45 AM
> To: hadoop-user@lucene.apache.org
> Subject: Setting number of Maps
> Hi,
> I'm writing a mapreduce task that will take a load of complex numbers,
> do some processing on each then return a double. As this processing will
> be complex and could take up to 10 minutes I am using Hadoop to
> distribute this amongst many machines.
> So ideally for each complex number I want a new map task to spread the
> load most efficiently. A typical run might have as many as 7500 complex
> numbers that need processing. I will eventually have access to a cluster
> of approximately 500 machines.
> So far, the only way I can get one map task per complex number is to
> create a new SequenceFile for each number in the input directory. This
> takes a while though and I was hoping I could just create a single
> SequenceFile holding all the complex numbers, and then use the
> JobConf.setNumMapTasks(n) to get one map task per number in the file.
> This doesn't work though, and I end up with approx 60-70 complex numbers
> per map task (depending on the total number of input numbers).
> Does anyone have any idea why this second method doesn't work? If it is
> not supposed to work in this way are there any suggestions as to how to
> get a map per input record without having to put each one in a separate
> file?
> Thanks in advance for any help,
> Ollie

View raw message