aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David McLaughlin <da...@dmclaughlin.com>
Subject Re: Proposal: API changes to getTasksStatus
Date Tue, 27 May 2014 23:39:26 GMT
Pagination would be a no-op to the client because it would be opt-in only,
so it would continue to fetch all the tasks in one request.

But you raise a good point in that presumably the client is also going to
be blocked for several seconds while executing getTasksStatus for large
jobs. Making the response more lightweight could be a big win there, but I
would need a better understanding of how the client is using those
responses first.


On Tue, May 27, 2014 at 3:34 PM, Mark Chu-Carroll <mchucarroll@apache.org>wrote:

> Interestingly, when we first expanded getTasksStatus, I didn't like the
> idea, because I thought it would have exactly this problem! It's a *lot* of
> information to get in a single burst.
>
> Have you checked what effect it'll have on the command-line client? In
> general, the command-line has the context do a single API call, gathers the
> results, and returns to a command implementation. It'll definitely
> complicate things to add pagination.  How much of an effect will it be?
>
>    -Mark
>
>
>
> On Tue, May 27, 2014 at 5:32 PM, David McLaughlin <david@dmclaughlin.com
> >wrote:
>
> > As outlined in AURORA-458, using the new jobs page with a large (but
> > reasonable) number of active and complete tasks can take a long time[1]
> to
> > render. Performance profiling done as part of AURORA-471 shows that the
> > main factor in response time is rendering and returning the size of the
> > uncompressed payload to the client.
> >
> > To that end, I think we have two approaches:
> >
> >  1) Add pagination to the getTasksStatus call.
> >  2) Make the getTasksStatus response more lightweight.
> >
> >
> > Pagination
> > ---------------
> >
> > Pagination would be the simplest approach, and would scale to arbitrarily
> > large numbers of tasks moving forward. The main issue with this is that
> we
> > need all active tasks to build the configuration summary at the top of
> the
> > job page.
> >
> > As a workaround we could add a new API call - getTaskConfigSummary -
> which
> > returns something like:
> >
> >
> > struct ConfigGroup {
> >   1: TaskConfig config
> >   2: set<i32> instanceIds
> > }
> >
> > struct ConfigSummary {
> >   1: JobKey jobKey
> >   2: set<ConfigGroup> groups
> > }
> >
> >
> > To support pagination without breaking the existing API, we could add
> > offset and limit fields to the TaskQuery struct.
> >
> >
> > Make getTasksStatus more lightweight
> > ------------------------------------
> >
> > getTasksStatus currently returns a list of ScheduledTask instances. The
> > biggest (in terms of payload size) child object of a ScheduledTask is the
> > TaskConfig struct, which itself contains an ExecutorConfig.
> >
> > I took a sample response from one of our internal production instances
> and
> > it turns out that around 65% of the total response size was for
> > ExecutorConfig objects, and specifically the "cmdline" property of these.
> > We currently do not use this information anywhere in the UI nor do we
> > inspect it when grouping taskConfigs, and it would be a relatively easy
> win
> > to just drop these from the response.
> >
> > We'd still need this information for the config grouping, so we could add
> > the response suggested for getTaskConfigSummary as another property and
> > allow the client to reconcile these objects if it needs to:
> >
> >
> > struct TaskStatusResponse {
> >   1: list<LightweightTask> tasks
> >   2: set<ConfigGroup> configSummary
> > }
> >
> >
> > This would significantly reduce the uncompressed payload size while still
> > containing the same data.
> >
> > However, there is still a potentially significant part of a payload size
> > remaining: task events (and these *are* currently used in the UI). We
> could
> > solve this by dropping task events from the LightweightTask struct too,
> and
> > fetching them lazily in batches.
> >
> > i.e. an API call like:
> >
> >
> > getTaskEvents(1: JobKey key, 2: set<i32> instanceIds)
> >
> >
> > Could return:
> >
> >
> > struct TaskEventResult {
> >   1: i32 instanceId
> >   2: list<TaskEvent> taskEvents
> > }
> >
> > struct TaskEventResponse {
> >   1: JobKey key
> >   2: list<TaskEventResult> results
> > }
> >
> >
> > Events could then only be fetched and rendered as the user clicks through
> > the pages of tasks.
> >
> >
> > Proposal
> > -------------
> >
> > I think pagination makes more sense here. It adds moderate overhead to
> the
> > complexity of the UI (this is purely due to our use of smart-table which
> is
> > not so server-side pagination friendly) but the client logic would
> actually
> > be simpler with the new getTaskConfigSummary api call.
> >
> > I do think there is value in considering whether the ScheduledTask struct
> > needs to contain all of the information it does - but this could be done
> as
> > part of a separate or complimentary performance improvement ticket.
> >
> >
> >
> >
> > [1] - At Twitter we observed 2000 active + 100 finished tasks having a
> > payload size of 10MB which took 8~10 seconds to complete.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message