kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashish Surana (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (KAFKA-6642) Rack aware replica assignment in kafka streams
Date Wed, 28 Mar 2018 04:14:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416778#comment-16416778
] 

Ashish Surana edited comment on KAFKA-6642 at 3/28/18 4:13 AM:
---------------------------------------------------------------

Current task assignor is sticky, and it can be made rack-aware. Where we ensure that same
tasks (active & replicas) are assigned different racks as much as possible.

Approach
 # RACK_ID is added in StreamsConfig file, and needs to be passed while starting kafka-streams
application. All the processes having same rack_id are considered in the same rack.
 # No changes in input topic partition to task assignment

 

Assignment of tasks to stream instances:
 # We assign active tasks to the instances which were having same task as active previously.
 # Active Tasks which couldn't be assigned in first step are assigned to the instances which
were having same task as standby previously.
 # Active tasks which still couldn't be assigned to instances in round-robin starting from
least-loaded instance
 # Above 3 steps are same as StickyAssignor as there is only one unique active task so no
extra rack aware logic is required in this step.
 # Now we have to assign standy-tasks, and here we assign standby to instances in different
rack then it's active task or other standy-tasks are running. If we run out of racks then
we can assign standby-tasks in same rack but different instances.
 # This makes the assignment rack-aware but more of a best effort and doesn't guarantee anything.
This is because we might not have capacity left in some racks or we might have more number
of replicas than number of racks etc

Note: Here we are making current StickyTaskAssignor rack-aware, but doesn't change the logic
drastically. For example, current assignor is only sticky for active tasks, and standby task
assignment logic is not sticky as it doesn't look for where the task was assigned previously.

Scenario#1
----
When RACK_ID is not passed in any of the stream instances.

In this case, assignment will happen as it's happening currently by StickyTaskAssignor. For
all the instances for whom RACK_ID is not passed are considered to be part of single default-rack.

 

Scenario#2
----
When RACK_ID is passed in all the stream instances.

In this case, all instances belong to one or the other rack, and assignment is rack-aware
as per above approach.

 

Scenario#3
----
When RACK_ID is passed in some stream instances but not in all.

In this case, all the instances with RACK_ID will belong to the provided racks. All the instances
for whom RACK_ID were not passed, will be considered to be part of single default-rack.

 

Please let us know what you guys think about approach.


was (Author: asurana):
Current task assignor is sticky, and it can be made rack-aware. Where we ensure that same
tasks (active & replicas) are assigned different racks as much as possible.

Approach
 # RACK_ID is added in StreamsConfig file, and needs to be passed while starting kafka-streams
application. All the processes having same rack_id are considered in the same rack.
 # No changes in input topic partition to task assignment

 

Assignment of tasks to stream instances:
 # We assign active tasks to the instances which were having same task as active previously.
 # Active Tasks which couldn't be assigned in first step are assigned to the instances which
were having same task as standby previously.
 # Active tasks which still couldn't be assigned to instances in round-robin starting from
least-loaded instance
 # Above 3 steps are same as StickyAssignor as there is only one unique active task so no
extra rack aware logic is required in this step.
 # Now we have to assign standy-tasks, and here we assign standby to instances in different
rack then it's active task or other standy-tasks are running. If we run out of racks then
we can assign standby-tasks in same rack but different instances.
 # This makes the assignment rack-aware but more of a best effort and doesn't guarantee anything.
This is because we might not have capacity left in some racks or we might have more number
of replicas than number of racks etc

Note: Here we are making current StickyTaskAssignor rack-aware, but doesn't change the logic
drastically. For example, current assignor is only sticky for active tasks, and standby task
assignment logic is not sticky as it doesn't look for where the task was assigned previously.

Scenario#1
----
When no RACK_ID is not passed in any of the stream instances.

In this case, assignment will happen as it's happening currently by StickyTaskAssignor. For
all the instances for whom RACK_ID is not passed are considered to be part of single default-rack.

 

Scenario#2
----
When RACK_ID is passed in all the stream instances.

In this case, all instances belong to one or the other rack, and assignment is rack-aware
as per above approach.

 

Scenario#3
----
When RACK_ID is passed in some stream instances but not in all.

In this case, all the instances with RACK_ID will belong to the provided racks. All the instances
for whom RACK_ID were not passed, will be considered to be part of single default-rack.

 

Please let us know what you guys think about approach.

> Rack aware replica assignment in kafka streams
> ----------------------------------------------
>
>                 Key: KAFKA-6642
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6642
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: Ashish Surana
>            Priority: Major
>
> We have rack aware replica assignment in kafka broker ([KIP-36 Rack aware replica assignment|https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]).
> This request is to have a similar feature for kafka streams applications. Standby tasks/standby
replica assignment in kafka streams is currently not rack aware, and this request is to make
it rack aware for better availability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message