hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Re:Re: Re: RE: Why a sql only use one map task?
Date Thu, 25 Aug 2011 14:51:55 GMT
Hi Daniel
         In the hadoop eco system the number of map tasks is actually decided by the job basically
based  no of input splits . Setting wouldn't assure that only that many number
of map tasks are triggered. What worked out here for you is that you were specifying that
a map tasks should process a min data volume by setting value for mapred.min.split size.
 So in your case in real there were 9 input splits but when you imposed a constrain on the
min data that a map task should handle, the map tasks came down to 3. 
Bejoy K S

-----Original Message-----
From: "Daniel,Wu" <>
Date: Thu, 25 Aug 2011 20:02:43 
To: <>
Subject: Re:Re:Re: Re: RE: Why a sql only use one map task?

after I set
set mapred.min.split.size=200000000;

Then it will kick off 3 map tasks (the file I have is 500M).  So looks like we need to set
mapred.min.split.size instead of to control how many maps to kick off.

At 2011-08-25 19:38:30,"Daniel,Wu" <> wrote:

It works, after I set as you said, but looks like I can't control the map task, it always
use 9 maps, even if I set

Kind% CompleteNum TasksPendingRunningCompleteKilledFailed/Killed
Task Attempts

900900 / 0

100100 / 0

At 2011-08-25 06:35:38,"Ashutosh Chauhan" <> wrote:
This may be because CombineHiveInputFormat is combining your splits in one map task. If you
don't want that to happen, do:
hive> set nputFormat

2011/8/24 Daniel,Wu<>

I pasted the inform I pasted blow, the map capacity is 6. And no matter how I set,
such as 3,  it doesn't work, as it always use 1 map task (please see the completed job information).

Cluster Summary (Heap Size is 16.81 MB/966.69 MB)
Running Map TasksRunning Reduce TasksTotal SubmissionsNodesOccupied Map SlotsOccupied Reduce
SlotsReserved Map SlotsReserved Reduce SlotsMap Task CapacityReduce Task CapacityAvg. Tasks/NodeBlacklisted
NodesExcluded Nodes

Completed Jobs
JobidPriorityUserNameMap % CompleteMap TotalMaps CompletedReduce % CompleteReduce TotalReduces
CompletedJob Scheduling InformationDiagnostic Info
job_201108242119_0001NORMALoracleselect count(*) from test(Stage-1)100.00%


job_201108242119_0002NORMALoracleselect count(*) from test(Stage-1)100.00%


job_201108242119_0003NORMALoracleselect count(*) from test(Stage-1)100.00%


job_201108242119_0004NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%


job_201108242119_0005NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%


job_201108242119_0006NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%



At 2011-08-24 18:19:38,wd <> wrote:
>What about your total Map Task Capacity?
>you may check it from http://your_jobtracker:50030/jobtracker.jsp

>2011/8/24 Daniel,Wu <>:
>> I checked my setting, all are with the default value.So per the book of
>> "Hadoop the definitive guide", the split size should be 64M. And the file
>> size is about 500M, so that's about 8 splits. And from the map job
>> information (after the map job is done), I can see it gets 8 split from one
>> node. But anyhow it starts only one map task.
>> At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <> wrote:
>> If you actually have splittable files you can set the following setting to
>> create more splits:
>> mapred.max.split.size appropriately.
>> Thanks
>> Vaibhav
>> From: Daniel,Wu []
>> Sent: Tuesday, August 23, 2011 6:51 AM
>> To: hive
>> Subject: Why a sql only use one map task?
>>   I run the following simple sql
>> select count(*) from sales;
>> And the job information shows it only uses one map task.
>> The underlying hadoop has 3 data/data nodes. So I expect hive should kick
>> off 3 map tasks, one on each task nodes. What can make hive only run one map
>> task? Do I need to set something to kick off multiple map task?  in my
>> config, I didn't change hive config.

View raw message