hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Udit Mehta <ume...@groupon.com>
Subject Re: Disable Hive autogather optimization
Date Fri, 29 Apr 2016 22:32:12 GMT
Hi,

Thanks for the replies.
We have a scenario where we have an ETL job inserting into a table with
thousands of partitions using dynamic partitioning. We have certain SLA's
within which we would like the job to finish and sometimes there are
scenarios where they are missed (extra data or a busy cluster). I
understand that stats are essential for Hive CBO but we are trying to
explore how much overhead do these stats collection add to the job runtime.
A lot of these tables are intermediary tables so having stats for them
might not be entirely necessary.

I just wanted to figure if there was a easy way to disable the stats and
then compare the performance.

Mich, can you give more information on how to disable it in the table
struct as I cant find any documentation on it.

Thanks again.
Udit

On Fri, Apr 29, 2016 at 10:42 AM, Pengcheng Xiong <pxiong@apache.org> wrote:

> Hi Udit,
>
>     Could u be more specific about your problem? Like, what settings you
> have, what query you run and what is the result and what result do you
> expect?
>
>     From what you said, my understanding is that, you want to wipe out the
> basic stats for existing tables? And, could u also let us know why you
> would like to get rid of the stats? Stats is crucial for Hive CBO to work
> and we are moving towards the direction to make table/column stats
> collection automatically. It seems that you prefer an opposite direction.
> There is nothing wrong here and we would like to listen to your idea and
> motivation so that we can better design Hive stats collection. Thanks!
>
> Best
> Pengcheng
>
>
> On Thu, Apr 28, 2016 at 4:12 PM, Udit Mehta <umehta@groupon.com> wrote:
>
>> Any insights on this?
>>
>> On Tue, Apr 26, 2016 at 7:32 PM, Udit Mehta <umehta@groupon.com> wrote:
>>
>>> Update: Realized this works if we create a fresh table with this config
>>> already disabled but does not work if there is already a table created when
>>> this config was enabled. We now need to figure out how to disable this
>>> config for a table created when this config was true.
>>>
>>> On Tue, Apr 26, 2016 at 6:16 PM, Udit Mehta <umehta@groupon.com> wrote:
>>>
>>>> Hive version we are using is 1.2.1.
>>>>
>>>> On Tue, Apr 26, 2016 at 6:01 PM, Udit Mehta <umehta@groupon.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We need to disable the Hive autogather stats optimization by disabling
>>>>> "*hive.stats.autogather*" but for some reason, the config change
>>>>> doesnt seem to go through. We modified this config in the hive-site.xml
and
>>>>> restarted the Hive metastore. We also made this change explicitly in
the
>>>>> job but it doesnt seem to help.
>>>>>
>>>>>
>>>>>
>>>>> *set hive.stats.autogather=false;*
>>>>> Does anyone know the right way to disable this config since we dont
>>>>> want to compute stats in out jobs.
>>>>>
>>>>> Thanks,
>>>>> Udit
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message