hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bejoy...@yahoo.com
Subject Re: Re:Re: Re: RE: Why a sql only use one map task?
Date Thu, 25 Aug 2011 14:51:55 GMT
Hi Daniel
         In the hadoop eco system the number of map tasks is actually decided by the job basically
based  no of input splits . Setting mapred.map.tasks wouldn't assure that only that many number
of map tasks are triggered. What worked out here for you is that you were specifying that
a map tasks should process a min data volume by setting value for mapred.min.split size.
 So in your case in real there were 9 input splits but when you imposed a constrain on the
min data that a map task should handle, the map tasks came down to 3. 
Regards
Bejoy K S

-----Original Message-----
From: "Daniel,Wu" <hadoop_wu@163.com>
Date: Thu, 25 Aug 2011 20:02:43 
To: <user@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: Re:Re:Re: Re: RE: Why a sql only use one map task?

after I set
set mapred.min.split.size=200000000;

Then it will kick off 3 map tasks (the file I have is 500M).  So looks like we need to set
mapred.min.split.size instead of mapred.map.tasks to control how many maps to kick off.


At 2011-08-25 19:38:30,"Daniel,Wu" <hadoop_wu@163.com> wrote:

It works, after I set as you said, but looks like I can't control the map task, it always
use 9 maps, even if I set
set mapred.map.tasks=2;


Kind% CompleteNum TasksPendingRunningCompleteKilledFailed/Killed
Task Attempts
map100.00%


900900 / 0
reduce100.00%


100100 / 0



At 2011-08-25 06:35:38,"Ashutosh Chauhan" <hashutosh@apache.org> wrote:
This may be because CombineHiveInputFormat is combining your splits in one map task. If you
don't want that to happen, do:
hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveI nputFormat


2011/8/24 Daniel,Wu<hadoop_wu@163.com>

I pasted the inform I pasted blow, the map capacity is 6. And no matter how I set  mapred.map.tasks,
such as 3,  it doesn't work, as it always use 1 map task (please see the completed job information).



Cluster Summary (Heap Size is 16.81 MB/966.69 MB)
Running Map TasksRunning Reduce TasksTotal SubmissionsNodesOccupied Map SlotsOccupied Reduce
SlotsReserved Map SlotsReserved Reduce SlotsMap Task CapacityReduce Task CapacityAvg. Tasks/NodeBlacklisted
NodesExcluded Nodes
00630000664.0000


Completed Jobs
JobidPriorityUserNameMap % CompleteMap TotalMaps CompletedReduce % CompleteReduce TotalReduces
CompletedJob Scheduling InformationDiagnostic Info
job_201108242119_0001NORMALoracleselect count(*) from test(Stage-1)100.00%


00100.00%


1 1NANA
job_201108242119_0002NORMALoracleselect count(*) from test(Stage-1)100.00%


11100.00%


1 1NANA
job_201108242119_0003NORMALoracleselect count(*) from test(Stage-1)100.00%


11100.00%


1 1NANA
job_201108242119_0004NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%


11100.00%


3 3NANA
job_201108242119_0005NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%


11100.00%


3 3NANA
job_201108242119_0006NORMALoracleselect period_key,count(*) from...period_key(Stage-1)100.00%


11100.00%


3 3NANA



At 2011-08-24 18:19:38,wd <wd@wdicc.com> wrote:
>What about your total Map Task Capacity?
>you may check it from http://your_jobtracker:50030/jobtracker.jsp

>
>2011/8/24 Daniel,Wu <hadoop_wu@163.com>:
>> I checked my setting, all are with the default value.So per the book of
>> "Hadoop the definitive guide", the split size should be 64M. And the file
>> size is about 500M, so that's about 8 splits. And from the map job
>> information (after the map job is done), I can see it gets 8 split from one
>> node. But anyhow it starts only one map task.
>>
>>
>>
>> At 2011-08-24 02:28:18,"Aggarwal, Vaibhav" <vaggarw@amazon.com> wrote:
>>
>> If you actually have splittable files you can set the following setting to
>> create more splits:
>>
>>
>>
>> mapred.max.split.size appropriately.
>>
>>
>>
>> Thanks
>>
>> Vaibhav
>>
>>
>>
>> From: Daniel,Wu [mailto:hadoop_wu@163.com]
>> Sent: Tuesday, August 23, 2011 6:51 AM
>> To: hive
>> Subject: Why a sql only use one map task?
>>
>>
>>
>>   I run the following simple sql
>> select count(*) from sales;
>> And the job information shows it only uses one map task.
>>
>> The underlying hadoop has 3 data/data nodes. So I expect hive should kick
>> off 3 map tasks, one on each task nodes. What can make hive only run one map
>> task? Do I need to set something to kick off multiple map task?  in my
>> config, I didn't change hive config.
>>
>>
>>
>>









Mime
View raw message