aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Khutornenko <ma...@apache.org>
Subject Re: Aurora performance impact with hourly query runs
Date Thu, 09 Jun 2016 15:37:37 GMT
There are plenty of factors that may contribute towards the behavior
you're observing. Based on the logs though it appears you are using
DBTaskStore (-use_beta_db_task_store=true)? If so, you may want to
revert to the default in-mem task store
(-use_beta_db_task_store=false) as DBTaskStore is known to perform
subpar on large task counts. This is a known issue and we plan to
invest into making it faster.

On Thu, Jun 9, 2016 at 6:58 AM, Erb, Stephan
<Stephan.Erb@blue-yonder.com> wrote:
> I am no expert here, but I would assume that slow task store operations could result
from a slow replicated log. Have you tried keeping it on an SSD? (https://github.com/apache/aurora/blob/e89521f1eebd9a5301eb02e2ed6ffebdecd54c9a/docs/operations/configuration.md#-native_log_file_path)
>
> FWIW, there was a recent RB by Maxim to reduce Master load unter task reconciliation:
https://reviews.apache.org/r/47373/diff/2#index_header
> ________________________________________
> From: Shyam Patel <sham.patel04@gmail.com>
> Sent: Thursday, June 9, 2016 07:48
> To: dev@aurora.apache.org
> Subject: Re: Aurora performance impact with hourly query runs
>
> Hi Bill,
>
> Cluster Set up : AWS
>
> 1 Mesos , 1 ZK , 1 Aurora instance : 4 CPU, 16G mem
>
> Aurora : Xmx 14G
>
> 100 nodes agent cluster : 40 CPU, 160G mem each
>
> 8000 Jobs, each with 2 instances. So, total ~16K containers
>
>
> Thanks,
> Sham
>
>
>
>> On Jun 8, 2016, at 9:18 PM, Bill Farner <wfarner@apache.org> wrote:
>>
>> Can you give some insight into the machine specs and JVM options used?
>>
>> Also, is it 8000 jobs or tasks?  The terms are often mixed up, but will
>> have a big difference here.
>>
>> On Wednesday, June 8, 2016, Shyam Patel <sham.patel04@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> While running LnP testing, I’m spinning of 8K docker jobs. During the run,
>>> I ran into issue where TaskStatUpdate and TaskReconciler queries taking
>>> real long times. During the time, Aurora is pretty much freezing and at a
>>> point dying.  Also, tried the same run w/o the docker jobs and faced the
>>> same issue.
>>>
>>>
>>> Is there a way to keep the Aurora performance intact during the query runs
>>> ?
>>>
>>>
>>>
>>> Here is snipped from log :
>>>
>>>
>>> I0602 00:53:37.527 [TaskStatUpdaterService RUNNING, DbTaskStore:104] Query
>>> took 1243517 ms: TaskQuery(owner:null, role:null, environment:null,
>>> jobName:null, taskIds:null, statuses:[STARTING, THROTTLED, RUNNING,
>>> DRAINING, ASSIGNED, KILLING, RESTARTING, PENDING, PREEMPTING],
>>> instanceIds:null, slaveHosts:null, jobKeys:null, offset:0, limit:0)
>>>
>>>
>>> I0602 00:56:54.180 [TaskReconciler-0, DbTaskStore:104] Query took 1380169
>>> ms: TaskQuery(owner:null, role:null, environment:null, jobName:null,
>>> taskIds:null, statuses:[STARTING, RUNNING, DRAINING, ASSIGNED, KILLING,
>>> RESTARTING, PREEMPTING], instanceIds:null, slaveHosts:null, jobKeys:null,
>>> offset:0, limit:0)
>>>
>>>
>>>
>>> Appreciate any insights..
>>>
>>>
>>> Thanks,
>>> Sham
>>>
>>>

Mime
View raw message