hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wojciech Langiewicz <wlangiew...@gmail.com>
Subject Re: Hive 0.7 using only one mapper
Date Fri, 29 Jul 2011 10:20:20 GMT
Hello,
Thank you for your answers, this solves the issue.
I have set mapred.max.split.size to 1024000000 in hive-site.xml and jobs 
are using appropriate number of mappers.

I have played a little with different configurations and 
CombineHiveInputFormat gives better performance than HiveInputFormat in 
my case.

Thanks again.
--
Wojciech Langiewicz

On 29.07.2011 05:43, Carl Steinbach wrote:
> Hi Wojciech,
>
> Vaibhav is correct. There's a configuration problem in the copy of
> hive-default.xml that ships with CDH3u1 which sets
> hive.input.format=CombineHiveInputFormat, but leaves mapred.max.split.size
> undefined. You can fix this problem by setting mapred.max.split.size in
> hive-default.xml to some reasonable value (it currently defaults
> to 256000000 on trunk).
>
> Sorry for the inconvenience.
>
> Carl
>
> On Thu, Jul 28, 2011 at 11:28 AM, Aggarwal, Vaibhav<vaggarw@amazon.com>wrote:
>
>> If you are using CombineHiveInputFormat it might be the case that all files
>> are being combined into one large split and hence 1 mapper gets created.**
>> **
>>
>> ** **
>>
>> If that is the case you can set the max split size in hive-default.xml
>> config file to create more splits and hence more map tasks:****
>>
>> ** **
>>
>> <property>****
>>
>>    <name>mapred.max.split.size</name>****
>>
>>    <value>  134217728</value>****
>>
>>    <description>The maximum size chunk that map input should be split****
>>
>>    into.</description>****
>>
>> </property>****
>>
>> ****
>>
>> Thanks****
>>
>> Vaibhav****
>>
>> ** **
>>
>> *From:* Edward Capriolo [mailto:edlinuxguru@gmail.com]
>> *Sent:* Thursday, July 28, 2011 7:10 AM
>> *To:* user@hive.apache.org
>> *Subject:* Re: Hive 0.7 using only one mapper****
>>
>> ** **
>>
>> ** **
>>
>> On Thu, Jul 28, 2011 at 9:23 AM, Wojciech Langiewicz<
>> wlangiewicz@gmail.com>  wrote:****
>>
>> Hello,
>> I'm having isssue running Hive jobs after updating from Hive 0.5 to Hive
>> 0.7 (from CDHb4 to CDHu1).
>>
>> No matter what query I'm running Hive is always using one mapper.
>> I have tried different queries with various sizes of input and ones with
>> many reducers or no reducers.
>>
>> For version 0.5 everything worked correctly.
>> I'm attaching my hive-site.xml: https://gist.github.com/1111531
>> I have tested also jobs with Pig, and those jobs use multiple mappers - so
>> I guess this is a Hive issue.
>>
>> Thank you for all your help.
>>
>> --
>> Wojciech Langiewicz****
>>
>>
>> You should also check that your hive-default.xml and other conf/ files is
>> up to 0.7.X. Having older versions of that file can lead to problems.
>>
>> Edward****
>>
>


Mime
View raw message