helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jiajun Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HELIX-655) Helix per-participant concurrent task throttling
Date Tue, 02 May 2017 22:11:04 GMT

    [ https://issues.apache.org/jira/browse/HELIX-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15993887#comment-15993887
] 

Jiajun Wang commented on HELIX-655:
-----------------------------------

h1. Design
h2. Task Throttling Per Participant
This limitation conceptually equals to max thread pool size in TaskStateModelFactory on the
participant.
If user constructs TaskStateModelFactory using their customized executor with a limited sized
thread pool, that participant will never executing more tasks than the threshold.
The problem is that since the limitation is not known by the controller, tasks will still
be assigned to the participant. And they will be queued in participant thread pool and never
re-assigned.

It makes more sense to throttle tasks in the controller. At the same time that tasks are assigned
to participants.
Basically, participant is configured with a "MaxRunnigTasksNumber". And the controller assigns
task accordingly.

h3. pseudo code
When calculating Best possible state in the JobRebalancer

Foreach Job in RunnableJobs:
    TaskToParticipantMapping = CalculateAssignment(Job)
    Foreach MappingEntry in TaskToParticipantMapping:
        If Running_task + ToBeAssigned_task exceeds Participant_Task_Threshold:
            TaskToParticipantMapping.remove(MappingEntry)
            Try next applicable participant (consider task attached to resource)

The above logic can be considered as a task queue algorithm. However, the original assignment
will keep relying on current logic. So if all participants have enough capacity, tasks will
still be evenly dispatched.
 
h3. participant configuration
{
  "id" : "localhost_12918",
  "simpleFields" : {
    "HELIX_ENABLED" : "true",
    "HELIX_ENABLED_TIMESTAMP" : "1493326930182",
    "HELIX_HOST" : "localhost",
    "HELIX_PORT" : "12918",
    "MAX_RUNNING_TASK" : "55"
  }
}

h3. Backward compatible
For old participants, controller assumes the thread pool is with a default capacity 40 (equal
to default message handling thread pool size).

h4. Assumption
Note that if some tasks have workload that are much heavier than others, only control tasks
number won't work.
In this design, we assume that tasks have the approximately same workload.

h3. [Stretch] Performance optimization
Existing JobRebalancer will be trigger everytime a state change event happens. That means
completely sorting all pending jobs/tasks and calculate assignment.
A better strategy is maintain a Job priority queue in the controller.
When a job became runnable, enqueue.
When a job is complete, dequeue.
Any task state update, check paticipant capacity and assign task from the queue if possible.
This refactoring is considered as a stretch goal.

h3. Alternative option
h4. "GlobalMaxConcurrentJobNumber" Per Cluster
Helix controller restricts the number of running jobs.
However, with this throttling, once a job is scheduled, it will occupy the slot until finish.
This will be bad when all the running jobs are long-run. No new jobs will be scheduled.
Moreover, it's harder for admin to set a reasonable total job count, given workflows and jobs
are usually quite different regarding their real workload.
Comparing these 2 options, "MaxRunnigTasksPerParticipant" is directly related to participant's
capacity. Once the controller schedule tasks according to this, we can for sure avoid overloading
the instances.
Even we throttle jobs, there is no guarantee about the running thread in each participant.
Moreover, user can currently control job scheduling by adjusting the frequency of submitting
jobs. So "GlobalMaxConcurrentJobNumber" is not necessary.

h2. Job Priority
Given limited resource, which job we schedule first?

h3. Schedule the jobs with the highest priority first until participants are full
In this design, we proposed the simplest solution for priority control.
User can configure job resource priority or Helix will assume "age" (time that the job was
scheduled) as priority.
If part of the jobs are assigned priority, others are not, Helix will assume jobs with priority
setting have higher priority.
One issue here is that if the application keeps sending high priority jobs to Helix, lower
priority jobs will be starving.
Since this is controlled by the application (and mostly desired result), Helix won't apply
any additional control on these starving jobs.

h3. Alternative options
h4. Option 1. Using per-job and per-workflow concurrency control to implement priority
WorkflowConfig.ParallelJobs and JobConfig.numConcurrentTasksPerInstance are used to control
how many jobs and tasks can be executed in parallel within a single workflow.
Given that the cluster administrators can configure these numbers "correctly", workflows will
be assigned expected resources eventually.
However, there is no promising that high priority workflows will be scheduled before others.
This is because tasks are picked up randomly, so the controller may end up with fill the task
pool with all items from low priority workflows.
Cons
# Hard for users to setup the right numbers.
# Cannot strictly ensure priority.
# May lead to low utilization.

h4. Option 2. Jobs are assigned with execution slots according to priority
Helix controller assigns different portions of resources (execute slots) to jobs according
to their priority. For instance, we may have following assignment given the total capacity
is 100.
So high priority jobs will always get a larger portion of resources. If any job does not use
all of its portions, our algorithm should be smart enough to assign those portion to other
jobs.
The problem of this method is complixity. In addition, since we are not assigning all possible
resource to the highest priority jobs, those jobs are not guaranteed to be finished quickly,
and users might feel confusing.

> Helix per-participant concurrent task throttling
> ------------------------------------------------
>
>                 Key: HELIX-655
>                 URL: https://issues.apache.org/jira/browse/HELIX-655
>             Project: Apache Helix
>          Issue Type: New Feature
>          Components: helix-core
>    Affects Versions: 0.6.x
>            Reporter: Jiajun Wang
>
> h1. Overview
> Currently, all runnable jobs/tasks in Helix are equally treated. They are all scheduled
according to the rebalancer algorithm. Means, their assignment might be different, but they
will all be in RUNNING state.
> This may cause an issue if there are too many concurrently runnable jobs. When Helix
controller starts all these jobs, the instances may be overload as they are assigning resources
and executing all the tasks. As a result, the jobs won't be able to finish in a reasonable
time window.
> The issue is even more critical to long run jobs. According to our meeting with Gobblin
team, when a job is scheduled, they allocate resource for the job. So in the situation described
above, more and more resources will be reserved for the pending jobs. The cluster will soon
be exhausted.
> For solving the problem, an application needs to schedule jobs in a relatively low frequency
(what Gobblin is doing now). This may cause low utilization.
> A better way to fix this issue, at framework level, is throttling jobs/tasks that are
running concurrently, and allowing setting priority for different jobs to control total execute
time.
> So given same amount of jobs, the cluster is in a better condition. As a result, jobs
running in that cluster have a more controllable execute time.
> Existing related control mechanisms are:
> * ConcurrentTasksPerInstance for each job
> * ParallelJobs for each workflow
> * Threadpool limitation on the participant if user customizes TaskStateModelFactory.
> But none of them can directly help when concurrent workflows or jobs number is large.
If an application keeps scheduling jobs/jobQueues, Helix will start any runnable jobs without
considering the workload on the participants.
> The application may be able to carefully configures these items to achieve the goal.
But they won't be able to easily find the sweet spot. Especially the cluster might be changing
(scale out etc.).
> h2. Problem summary
> # All runnable tasks will start executing, which may overload the participant.
> # Application needs a mechanism to prioritize important jobs (or workflows). Otherwise,
important tasks may be blocked by other less important ones. And allocated resource is wasted.
> h3. Feature proposed
> Based on our discussing, we proposed 2 features that can help to resolve the issue.
> # Running task throttling on each participant. This is for avoiding overload.
> # Job priority control that ensures high priority jobs are scheduled earlier.
> In addition, application can leverage workflow/job monitor items as feedback from Helix
to adjust their stretgy.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message