hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sun, Rui" <>
Subject RE: Fail to Increase Hive Mapper Tasks?
Date Fri, 03 Jan 2014 06:13:44 GMT
As for the number of 22, I guess that your table have multiple files, probably 2.
HIVE will divide the desired number of map tasks evenly among the files of the table. And
the number of map tasks for a file may be increased because the file size can't be divided
exactly by it.

-----Original Message-----
From: Ji ZHANG [] 
Sent: Friday, January 03, 2014 12:27 PM
Subject: Re: Fail to Increase Hive Mapper Tasks?

Hi Rui,

I combined your suggestion with the answer from SO(,
and it works:

set = 20;
select count(*) from dw_stage.st_dw_marketing_touch_pi_metrics_basic;

It'll use 22 mappers, though I don't know why it's not an exact 20.
And I'm using Hive 0.9 + Hadoop 1.01.

Thank you very much.


On Fri, Jan 3, 2014 at 10:51 AM, Sun, Rui <> wrote:
> Hi, You can try set = 19.
> It seems that HIVE is using the old Hadoop MapReduce API and so mapred.max.split.size
won't work.
> -----Original Message-----
> From: Ji Zhang []
> Sent: Thursday, January 02, 2014 3:56 PM
> To:
> Subject: Fail to Increase Hive Mapper Tasks?
> Hi,
> I have a managed Hive table, which contains only one 150MB file. I then do "select count(*)
from tbl" to it, and it uses 2 mappers. I want to set it to a bigger number.
> First I tried 'set mapred.max.split.size=8388608;', so hopefully it will use 19 mappers.
But it only uses 3. Somehow it still split the input by 64MB. I also used 'set dfs.block.size=8388608;',
not working either.
> Then I tried a vanilla map-reduce job to do the same thing. It initially uses 3 mappers,
and when I set mapred.max.split.size, it uses 19. So the problem lies in Hive, I suppose.
> I read some of the Hive source code, like CombineHiveInputFormat, ExecDriver, etc. can't
find a clue.
> What else settings can I use?
> Thanks in advance.
> Jerry
View raw message