hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enis Soztutar <enis.soz.nu...@gmail.com>
Subject Re: Setting number of Maps
Date Thu, 05 Jul 2007 13:12:09 GMT
Just run two MR jobs sequentially. First takes the inputsequenceFile and 
writes each record(complex number) to a file in a temp dir. In the 
second job, extend MultiFileInputFormat and just override 
getRecordReader (possibly returning a custom record reader). set the 
number of maps equal to the number of complex numbers. Then you can hope 
one record per map.

Runping Qi wrote:
> Seem your thinking is on the right track.
> You can use one map/reduce job to split your input file containing the
> complex numbers into desired number of files. This should be easy to do.
> Then you can run your main job on the split files which will offer you
> desired parallelism.
>
> One thing to keep in mind is that your job needs to make sure to report
> progress frequently enough during the 10 minutes processing, so that the job
> tracker will not kill your job due to "timeout".
>
> Runping
>
>
>   
>> -----Original Message-----
>> From: Oliver Haggarty [mailto:ojh06@doc.ic.ac.uk]
>> Sent: Tuesday, July 03, 2007 8:45 AM
>> To: hadoop-user@lucene.apache.org
>> Subject: Setting number of Maps
>>
>> Hi,
>>
>> I'm writing a mapreduce task that will take a load of complex numbers,
>> do some processing on each then return a double. As this processing will
>> be complex and could take up to 10 minutes I am using Hadoop to
>> distribute this amongst many machines.
>>
>> So ideally for each complex number I want a new map task to spread the
>> load most efficiently. A typical run might have as many as 7500 complex
>> numbers that need processing. I will eventually have access to a cluster
>> of approximately 500 machines.
>>
>> So far, the only way I can get one map task per complex number is to
>> create a new SequenceFile for each number in the input directory. This
>> takes a while though and I was hoping I could just create a single
>> SequenceFile holding all the complex numbers, and then use the
>> JobConf.setNumMapTasks(n) to get one map task per number in the file.
>> This doesn't work though, and I end up with approx 60-70 complex numbers
>> per map task (depending on the total number of input numbers).
>>
>> Does anyone have any idea why this second method doesn't work? If it is
>> not supposed to work in this way are there any suggestions as to how to
>> get a map per input record without having to put each one in a separate
>> file?
>>
>> Thanks in advance for any help,
>>
>> Ollie
>>     
>
>
>   

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message