hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Harvey <dan.har...@mendeley.com>
Subject DBInputFormat number of mappers
Date Tue, 13 Apr 2010 14:46:52 GMT
Hi,

I'm importing data from a mysql database using the DBInputFormat to go over
the rows in a table and put them into HBase with the mapper but I can't find
a way to increase the number of maps it splits the input into. I am running
this on a cluster where we have 5 nodes and each node has a maximum of 2 map
tasks. So for example if I set the number of rows to import to be 10,000,000
then there will only 2 maps tasks and use only two of the nodes..

I've tried increasing the limit manually in the code with :

job.getConfiguration().setInt("mapred.map.tasks", 4);

increasing the number on the command line to set the same property, and also
increasing the number of map tasks per node.
But in all cases mapred.map.tasks is set to 2 in the job xml config file.

I've had a look at the code and DBInputFormat splits the total number of
rows over mapred.map.tasks, so I'm guessing it's just getting that to
change.

It would be great if anyone has any ideas what's going on?

Thanks,

-- 
Dan Harvey | Datamining Engineer
www.mendeley.com/profiles/dan-harvey

Mendeley Limited | London, UK | www.mendeley.com
Registered in England and Wales | Company Number 6419015

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message