aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shyam Patel <sham.pate...@gmail.com>
Subject Re: Aurora performance impact with hourly query runs
Date Thu, 09 Jun 2016 19:49:54 GMT
Actually, ignore my previous comment. It is set to true for the image noticed the issue with..

I will test out with turning it ‘false’ and get back with findings..


Thanks much !

_Sham


> On Jun 9, 2016, at 12:22 PM, Shyam Patel <sham.patel04@gmail.com> wrote:
> 
> Actually, checking on the flag  ‘use_beta_db_task' is false (default).. 
> 
> INFO: use_beta_db_task_store (org.apache.aurora.scheduler.storage.db.DbModule.use_beta_db_task_store):
false
> Jun 09, 2016 7:11:20 PM org.apache.aurora.common.args.ArgScanner process
> 
> 
> _Shyam
> 
> 
> 
>> On Jun 9, 2016, at 10:06 AM, Maxim Khutornenko <maxim@apache.org <mailto:maxim@apache.org>>
wrote:
>> 
>> Scheduler persists its state in the Mesos replicated log regardless of
>> the in-memory engine. If you change the flag and restart scheduler all
>> tasks are going to be re-inserted into MemTaskStore instead of
>> DBTaskStore. No data will be lost.
>> 
>> On Thu, Jun 9, 2016 at 9:55 AM, Shyam Patel <sham.patel04@gmail.com <mailto:sham.patel04@gmail.com>>
wrote:
>>> Thanks Maxim,
>>> 
>>> If we move to mem task store, restart of aurora would lose the data ? (btw, I’m
running aurora in a container)
>>> 
>>> 
>>> 
>>>> On Jun 9, 2016, at 8:37 AM, Maxim Khutornenko <maxim@apache.org <mailto:maxim@apache.org>>
wrote:
>>>> 
>>>> There are plenty of factors that may contribute towards the behavior
>>>> you're observing. Based on the logs though it appears you are using
>>>> DBTaskStore (-use_beta_db_task_store=true)? If so, you may want to
>>>> revert to the default in-mem task store
>>>> (-use_beta_db_task_store=false) as DBTaskStore is known to perform
>>>> subpar on large task counts. This is a known issue and we plan to
>>>> invest into making it faster.
>>>> 
>>>> On Thu, Jun 9, 2016 at 6:58 AM, Erb, Stephan
>>>> <Stephan.Erb@blue-yonder.com <mailto:Stephan.Erb@blue-yonder.com>>
wrote:
>>>>> I am no expert here, but I would assume that slow task store operations
could result from a slow replicated log. Have you tried keeping it on an SSD? (https://github.com/apache/aurora/blob/e89521f1eebd9a5301eb02e2ed6ffebdecd54c9a/docs/operations/configuration.md#-native_log_file_path
<https://github.com/apache/aurora/blob/e89521f1eebd9a5301eb02e2ed6ffebdecd54c9a/docs/operations/configuration.md#-native_log_file_path>)
>>>>> 
>>>>> FWIW, there was a recent RB by Maxim to reduce Master load unter task
reconciliation: https://reviews.apache.org/r/47373/diff/2#index_header <https://reviews.apache.org/r/47373/diff/2#index_header>
>>>>> ________________________________________
>>>>> From: Shyam Patel <sham.patel04@gmail.com <mailto:sham.patel04@gmail.com>>
>>>>> Sent: Thursday, June 9, 2016 07:48
>>>>> To: dev@aurora.apache.org <mailto:dev@aurora.apache.org>
>>>>> Subject: Re: Aurora performance impact with hourly query runs
>>>>> 
>>>>> Hi Bill,
>>>>> 
>>>>> Cluster Set up : AWS
>>>>> 
>>>>> 1 Mesos , 1 ZK , 1 Aurora instance : 4 CPU, 16G mem
>>>>> 
>>>>> Aurora : Xmx 14G
>>>>> 
>>>>> 100 nodes agent cluster : 40 CPU, 160G mem each
>>>>> 
>>>>> 8000 Jobs, each with 2 instances. So, total ~16K containers
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Sham
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jun 8, 2016, at 9:18 PM, Bill Farner <wfarner@apache.org <mailto:wfarner@apache.org>>
wrote:
>>>>>> 
>>>>>> Can you give some insight into the machine specs and JVM options
used?
>>>>>> 
>>>>>> Also, is it 8000 jobs or tasks?  The terms are often mixed up, but
will
>>>>>> have a big difference here.
>>>>>> 
>>>>>> On Wednesday, June 8, 2016, Shyam Patel <sham.patel04@gmail.com
<mailto:sham.patel04@gmail.com>> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> While running LnP testing, I’m spinning of 8K docker jobs.
During the run,
>>>>>>> I ran into issue where TaskStatUpdate and TaskReconciler queries
taking
>>>>>>> real long times. During the time, Aurora is pretty much freezing
and at a
>>>>>>> point dying.  Also, tried the same run w/o the docker jobs and
faced the
>>>>>>> same issue.
>>>>>>> 
>>>>>>> 
>>>>>>> Is there a way to keep the Aurora performance intact during the
query runs
>>>>>>> ?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Here is snipped from log :
>>>>>>> 
>>>>>>> 
>>>>>>> I0602 00:53:37.527 [TaskStatUpdaterService RUNNING, DbTaskStore:104]
Query
>>>>>>> took 1243517 ms: TaskQuery(owner:null, role:null, environment:null,
>>>>>>> jobName:null, taskIds:null, statuses:[STARTING, THROTTLED, RUNNING,
>>>>>>> DRAINING, ASSIGNED, KILLING, RESTARTING, PENDING, PREEMPTING],
>>>>>>> instanceIds:null, slaveHosts:null, jobKeys:null, offset:0, limit:0)
>>>>>>> 
>>>>>>> 
>>>>>>> I0602 00:56:54.180 [TaskReconciler-0, DbTaskStore:104] Query
took 1380169
>>>>>>> ms: TaskQuery(owner:null, role:null, environment:null, jobName:null,
>>>>>>> taskIds:null, statuses:[STARTING, RUNNING, DRAINING, ASSIGNED,
KILLING,
>>>>>>> RESTARTING, PREEMPTING], instanceIds:null, slaveHosts:null, jobKeys:null,
>>>>>>> offset:0, limit:0)
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Appreciate any insights..
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Sham
>>>>>>> 
>>>>>>> 
>>> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message