hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ji ZHANG <zhangj...@gmail.com>
Subject Re: Fail to Increase Hive Mapper Tasks?
Date Fri, 03 Jan 2014 04:27:06 GMT
Hi Rui,

I combined your suggestion with the answer from
SO(http://stackoverflow.com/questions/20816726/fail-to-increase-hive-mapper-tasks),
and it works:

set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set mapred.map.tasks = 20;
select count(*) from dw_stage.st_dw_marketing_touch_pi_metrics_basic;

It'll use 22 mappers, though I don't know why it's not an exact 20.
And I'm using Hive 0.9 + Hadoop 1.01.

Thank you very much.

Jerry

On Fri, Jan 3, 2014 at 10:51 AM, Sun, Rui <rui.sun@intel.com> wrote:
> Hi, You can try set mapred.map.tasks = 19.
> It seems that HIVE is using the old Hadoop MapReduce API and so mapred.max.split.size
won't work.
>
> -----Original Message-----
> From: Ji Zhang [mailto:zhangji87@gmail.com]
> Sent: Thursday, January 02, 2014 3:56 PM
> To: user@hive.apache.org
> Subject: Fail to Increase Hive Mapper Tasks?
>
> Hi,
>
> I have a managed Hive table, which contains only one 150MB file. I then do "select count(*)
from tbl" to it, and it uses 2 mappers. I want to set it to a bigger number.
>
> First I tried 'set mapred.max.split.size=8388608;', so hopefully it will use 19 mappers.
But it only uses 3. Somehow it still split the input by 64MB. I also used 'set dfs.block.size=8388608;',
not working either.
>
> Then I tried a vanilla map-reduce job to do the same thing. It initially uses 3 mappers,
and when I set mapred.max.split.size, it uses 19. So the problem lies in Hive, I suppose.
>
> I read some of the Hive source code, like CombineHiveInputFormat, ExecDriver, etc. can't
find a clue.
>
> What else settings can I use?
>
> Thanks in advance.
>
> Jerry

Mime
View raw message