hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Wong <sw...@netflix.com>
Subject RE: Dose block size determine the number of map task
Date Thu, 02 Jun 2011 18:04:05 GMT
I always set it, so am not sure what the behavior is if it is not set. You should probably
always set it. See the comments/code in CombineFileInputFormat.java for detail.


From: Junxian Yan [mailto:junxian.yan@gmail.com]
Sent: Wednesday, June 01, 2011 7:54 PM
To: Steven Wong; user@hive.apache.org
Subject: Re: Dose block size determine the number of map task

Thx. So that means hadoop will treat conbinehiveinput as one block if not set split paramters,
is it right?

R
On Wed, Jun 1, 2011 at 6:44 PM, Steven Wong <swong@netflix.com<mailto:swong@netflix.com>>
wrote:
When using CombineHiveInputFormat, parameters such as mapred.max.split.size (and others) help
determine how the input is split across mappers. Other factors include whether your input
files' format is a splittable format or not.

Hope this helps.


From: Junxian Yan [mailto:junxian.yan@gmail.com<mailto:junxian.yan@gmail.com>]
Sent: Wednesday, June 01, 2011 12:45 AM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Dose block size determine the number of map task

I saw this in hadoop wiki: http://wiki.apache.org/hadoop/HowManyMapsAndReduces

But in my experiment,I see the different result. When I set the CombineHiveInputFormat in
hive and by the doc, the default block should be 64M, but my input files are more than 64M,
hadoop still created one map task to handle all data.

Can you help to figure out where is wrong?

R


Mime
View raw message