hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Harvey <dan.har...@mendeley.com>
Subject Re: DBInputFormat number of mappers
Date Tue, 13 Apr 2010 15:09:04 GMT
Right, after sending this e-mail out that started working straight away with
no changes... So setting the number of mappers in the code using :-

job.getConfiguration().setInt("mapred.map.tasks", 4);

allowed me to specify the number of splits/map tasks.

Which lead me to the second problem I've been getting for awhile. When I
start a hadoop job using DBInputFormat as the input if I use 5 splits say
one will start straight away and the others will stay in the initializing
until it is done then carry on one at a time. This doesn't happen all the
time though and using the same code and database some will sometimes start
in parallel!

I've read this has happened to others before but no clear solution was found
then.

Has anyone else had this before or found a way to solve it?

Thanks,

On 13 April 2010 15:46, Dan Harvey <dan.harvey@mendeley.com> wrote:

> Hi,
>
> I'm importing data from a mysql database using the DBInputFormat to go over
> the rows in a table and put them into HBase with the mapper but I can't find
> a way to increase the number of maps it splits the input into. I am running
> this on a cluster where we have 5 nodes and each node has a maximum of 2 map
> tasks. So for example if I set the number of rows to import to be 10,000,000
> then there will only 2 maps tasks and use only two of the nodes..
>
> I've tried increasing the limit manually in the code with :
>
> job.getConfiguration().setInt("mapred.map.tasks", 4);
>
> increasing the number on the command line to set the same property, and
> also increasing the number of map tasks per node.
> But in all cases mapred.map.tasks is set to 2 in the job xml config file.
>
> I've had a look at the code and DBInputFormat splits the total number of
> rows over mapred.map.tasks, so I'm guessing it's just getting that to
> change.
>
> It would be great if anyone has any ideas what's going on?
>
> Thanks,
>
> --
> Dan Harvey | Datamining Engineer
> www.mendeley.com/profiles/dan-harvey
>
> Mendeley Limited | London, UK | www.mendeley.com
> Registered in England and Wales | Company Number 6419015
>



-- 
Dan Harvey | Datamining Engineer
www.mendeley.com/profiles/dan-harvey

Mendeley Limited | London, UK | www.mendeley.com
Registered in England and Wales | Company Number 6419015

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message