aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Farner <wfar...@apache.org>
Subject Re: Aurora performance impact with hourly query runs
Date Mon, 13 Jun 2016 03:00:53 GMT
MemTaskStore is the default

On Sunday, June 12, 2016, <meghdoot_b@yahoo.com.invalid> wrote:

> Yes Maxim really appreciate the tip. That's quiet a difference.
> One follow up question, any reason of not making MemTaskStore the default
> in aurora?
>
> Thx
>
> Sent from my iPhone
>
> > On Jun 12, 2016, at 9:48 AM, Shyam Patel <sham.patel04@gmail.com
> <javascript:;>> wrote:
> >
> > The query performance improved drastically, It took only 29ms for 12K
> jobs/30K tasks.. (from an hour !)
> >
> > Thanks Maxim for quick lead, really appreciate your help.
> >
> >
> >
> > Thanks,
> > Sham
> >
> >> On Jun 9, 2016, at 10:06 AM, Maxim Khutornenko <maxim@apache.org
> <javascript:;>> wrote:
> >>
> >> Scheduler persists its state in the Mesos replicated log regardless of
> >> the in-memory engine. If you change the flag and restart scheduler all
> >> tasks are going to be re-inserted into MemTaskStore instead of
> >> DBTaskStore. No data will be lost.
> >>
> >>> On Thu, Jun 9, 2016 at 9:55 AM, Shyam Patel <sham.patel04@gmail.com
> <javascript:;>> wrote:
> >>> Thanks Maxim,
> >>>
> >>> If we move to mem task store, restart of aurora would lose the data ?
> (btw, I’m running aurora in a container)
> >>>
> >>>
> >>>
> >>>> On Jun 9, 2016, at 8:37 AM, Maxim Khutornenko <maxim@apache.org
> <javascript:;>> wrote:
> >>>>
> >>>> There are plenty of factors that may contribute towards the behavior
> >>>> you're observing. Based on the logs though it appears you are using
> >>>> DBTaskStore (-use_beta_db_task_store=true)? If so, you may want to
> >>>> revert to the default in-mem task store
> >>>> (-use_beta_db_task_store=false) as DBTaskStore is known to perform
> >>>> subpar on large task counts. This is a known issue and we plan to
> >>>> invest into making it faster.
> >>>>
> >>>> On Thu, Jun 9, 2016 at 6:58 AM, Erb, Stephan
> >>>> <Stephan.Erb@blue-yonder.com <javascript:;>> wrote:
> >>>>> I am no expert here, but I would assume that slow task store
> operations could result from a slow replicated log. Have you tried keeping
> it on an SSD? (
> https://github.com/apache/aurora/blob/e89521f1eebd9a5301eb02e2ed6ffebdecd54c9a/docs/operations/configuration.md#-native_log_file_path
> )
> >>>>>
> >>>>> FWIW, there was a recent RB by Maxim to reduce Master load unter
> task reconciliation:
> https://reviews.apache.org/r/47373/diff/2#index_header
> >>>>> ________________________________________
> >>>>> From: Shyam Patel <sham.patel04@gmail.com <javascript:;>>
> >>>>> Sent: Thursday, June 9, 2016 07:48
> >>>>> To: dev@aurora.apache.org <javascript:;>
> >>>>> Subject: Re: Aurora performance impact with hourly query runs
> >>>>>
> >>>>> Hi Bill,
> >>>>>
> >>>>> Cluster Set up : AWS
> >>>>>
> >>>>> 1 Mesos , 1 ZK , 1 Aurora instance : 4 CPU, 16G mem
> >>>>>
> >>>>> Aurora : Xmx 14G
> >>>>>
> >>>>> 100 nodes agent cluster : 40 CPU, 160G mem each
> >>>>>
> >>>>> 8000 Jobs, each with 2 instances. So, total ~16K containers
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Sham
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Jun 8, 2016, at 9:18 PM, Bill Farner <wfarner@apache.org
> <javascript:;>> wrote:
> >>>>>>
> >>>>>> Can you give some insight into the machine specs and JVM options
> used?
> >>>>>>
> >>>>>> Also, is it 8000 jobs or tasks?  The terms are often mixed up,
but
> will
> >>>>>> have a big difference here.
> >>>>>>
> >>>>>>> On Wednesday, June 8, 2016, Shyam Patel <sham.patel04@gmail.com
> <javascript:;>> wrote:
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> While running LnP testing, I’m spinning of 8K docker jobs.
During
> the run,
> >>>>>>> I ran into issue where TaskStatUpdate and TaskReconciler
queries
> taking
> >>>>>>> real long times. During the time, Aurora is pretty much
freezing
> and at a
> >>>>>>> point dying.  Also, tried the same run w/o the docker jobs
and
> faced the
> >>>>>>> same issue.
> >>>>>>>
> >>>>>>>
> >>>>>>> Is there a way to keep the Aurora performance intact during
the
> query runs
> >>>>>>> ?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Here is snipped from log :
> >>>>>>>
> >>>>>>>
> >>>>>>> I0602 00:53:37.527 [TaskStatUpdaterService RUNNING,
> DbTaskStore:104] Query
> >>>>>>> took 1243517 ms: TaskQuery(owner:null, role:null, environment:null,
> >>>>>>> jobName:null, taskIds:null, statuses:[STARTING, THROTTLED,
RUNNING,
> >>>>>>> DRAINING, ASSIGNED, KILLING, RESTARTING, PENDING, PREEMPTING],
> >>>>>>> instanceIds:null, slaveHosts:null, jobKeys:null, offset:0,
limit:0)
> >>>>>>>
> >>>>>>>
> >>>>>>> I0602 00:56:54.180 [TaskReconciler-0, DbTaskStore:104] Query
took
> 1380169
> >>>>>>> ms: TaskQuery(owner:null, role:null, environment:null,
> jobName:null,
> >>>>>>> taskIds:null, statuses:[STARTING, RUNNING, DRAINING, ASSIGNED,
> KILLING,
> >>>>>>> RESTARTING, PREEMPTING], instanceIds:null, slaveHosts:null,
> jobKeys:null,
> >>>>>>> offset:0, limit:0)
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Appreciate any insights..
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Sham
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message