hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Loïc Chanel <loic.cha...@telecomnancy.net>
Subject Re: Quota for rogue ad-hoc queries
Date Thu, 01 Sep 2016 15:08:24 GMT
On the topic of timeout, if I may say, they are a dangerous way to deal
with requests as a "good" request may last longer than an "evil" one.
Be sure timeouts won't kill any important job before putting them into
place. You can set these things on in the components (Tez, MapReduce ...)
parameters, but not directly into YARN. At least it was the case when I
tried this (one year ago).

Regards,

Loïc CHANEL
System & virtualization engineer
TO - XaaS Ind - Worldline (Villeurbanne, France)

2016-09-01 16:52 GMT+02:00 Stephen Sprague <spragues@gmail.com>:

> > rogue queries
>
> so this really isn't limited to just hive is it?  any dbms system perhaps
> has to contend with this.  even malicious rogue queries as a matter of fact.
>
> timeouts are cheap way systems handle this - assuming time is related to
> resource. i'm sure beeline or whatever client you use has a timeout feature.
>
> maybe one could write a separate service - say a governor - that watches
> over YARN (or hdfs or whatever resource is rare) - and terminates the
> process if it goes beyond a threshold.  think OOM killer.
>
> but, yeah, i admittedly don't know of something out there already you can
> just tap into but YARN's Resource Manager seems to be place i'd research
> for starters. Just look look at its name. :)
>
> my unsolicited 2 cents.
>
>
>
> On Wed, Aug 31, 2016 at 10:24 PM, ravi teja <raviorteja@gmail.com> wrote:
>
>> Thanks Mich,
>>
>> Unfortunately we have many insert queries.
>> Are there any other ways?
>>
>> Thanks,
>> Ravi
>>
>> On Wed, Aug 31, 2016 at 9:45 PM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Trt this
>>>
>>> hive.limit.optimize.fetch.max
>>>
>>>    - Default Value: 50000
>>>    - Added In: Hive 0.8.0
>>>
>>> Maximum number of rows allowed for a smaller subset of data for simple
>>> LIMIT, if it is a fetch query. Insert queries are not restricted by this
>>> limit.
>>>
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 31 August 2016 at 13:42, ravi teja <raviorteja@gmail.com> wrote:
>>>
>>>> Hi Community,
>>>>
>>>> Many users run adhoc hive queries on our platform.
>>>> Some rogue queries managed to fill up the hdfs space and causing
>>>> mainstream queries to fail.
>>>>
>>>> We wanted to limit the data generated by these adhoc queries.
>>>> We are aware of strict param which limits the data being scanned, but
>>>> it is of less help as huge number of user tables aren't partitioned.
>>>>
>>>> Is there a way we can limit the data generated from hive per query,
>>>> like a hve parameter for setting HDFS quotas for job level *scratch*
>>>> directory or any other approach?
>>>> What's the general approach to gaurdrail such multi-tenant cases.
>>>>
>>>> Thanks in advance,
>>>> Ravi
>>>>
>>>
>>>
>>
>

Mime
View raw message