kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Onur Karaman (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KAFKA-5502) read current brokers from zookeeper upon processing broker change
Date Thu, 22 Jun 2017 20:02:00 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Onur Karaman updated KAFKA-5502:
--------------------------------
    Description: 
[~lindong]'s testing of the 0.11.0 release revealed a controller-side performance regression
in clusters with many brokers and many partitions when bringing up many brokers simultaneously.

The regression is caused by KAFKA-5028: a Watcher receives WatchedEvent notifications from
the raw ZooKeeper client EventThread. A WatchedEvent only contains the following information:
- KeeperState
- EventType
- path

Note that it does not actually contain the current data or current set of children associated
with the data/child change notification. It is up to the user to do this lookup to see the
current data or set of children.

ZkClient is itself a Watcher. When it receives a WatchedEvent, it puts a ZkEvent into its
own queue which its own ZkEventThread processes. Users of ZkClient interact with these notifications
through listeners (IZkDataListener, IZkChildListener). IZkDataListener actually expects as
input the current data of the watched znode, and likewise IZkChildListener actually expects
as input the current set of children of the watched znode. In order to provide this information
to the listeners, the ZkEventThread, when processing the ZkEvent in its queue, looks up the
information (either the current data or current set of children) simultaneously sets up the
next watch, and passes the result to the listener.

The regression introduced in KAFKA-5028 is the time at which we lookup the information needed
for the event processing.

In the past, the lookup from the ZkEventThread during ZkEvent processing would be passed into
the listener which is processed immediately after. For instance in ZkClient.fireChildChangedEvents:
{code}
List<String> children = getChildren(path);
listener.handleChildChange(path, children);
{code}
Now, however, there are multiple listeners that pass information looked up by the ZkEventThread
into a ControllerEvent which gets processed potentially much later. For instance in BrokerChangeListener:
{code}
class BrokerChangeListener(controller: KafkaController) extends IZkChildListener with Logging
{
  override def handleChildChange(parentPath: String, currentChilds: java.util.List[String]):
Unit = {
    import JavaConverters._
    controller.addToControllerEventQueue(controller.BrokerChange(currentChilds.asScala))
  }
}
{code}

In terms of impact, this:
- increases the odds of working with stale information by the time the ControllerEvent gets
processed.
- can cause the cluster to take a long time to stabilize if you bring up many brokers simultaneously.

In terms of how to solve it:
- (short term) just ignore the ZkClient's information lookup and repeat the lookup at the
start of the ControllerEvent. This increases reads from 1 read per change to 2 reads per change.
This is the approach taken in this ticket.
- (long term) try to remove a queue. This basically means getting rid of ZkClient. This is
likely the approach that will be taken in KAFKA-5501. Note that with KAFKA-5501, we can revert
this short term fix so that we reduce the reads from 2 reads per change back down to 1 read
per change.

  was:
[~lindong]'s testing of the 0.11.0 release revealed a controller-side performance regression
in clusters with many brokers and many partitions when bringing up many brokers simultaneously.

The regression is caused by KAFKA-5028: a Watcher receives WatchedEvent notifications from
the raw ZooKeeper client EventThread. A WatchedEvent only contains the following information:
- KeeperState
- EventType
- path

Note that it does not actually contain the current data or current set of children associated
with the data/child change notification. It is up to the user to do this lookup to see the
current data or set of children.

ZkClient is itself a Watcher. When it receives a WatchedEvent, it puts a ZkEvent into its
own queue which its own ZkEventThread processes. Users of ZkClient interact with these notifications
through listeners (IZkDataListener, IZkChildListener). IZkDataListener actually expects as
input the current data of the watched znode, and likewise IZkChildListener actually expects
as input the current set of children of the watched znode. In order to provide this information
to the listeners, the ZkEventThread, when processing the ZkEvent in its queue, looks up the
information (either the current data or current set of children) simultaneously sets up the
next watch, and passes the result to the listener.

The regression introduced in KAFKA-5028 is the time at which we lookup the information needed
for the event processing.

In the past, the lookup from the ZkEventThread during ZkEvent processing would be passed into
the listener which is processed immediately after. For instance in ZkClient.fireChildChangedEvents:
{code}
List<String> children = getChildren(path);
listener.handleChildChange(path, children);
{code}
Now, however, there are multiple listeners that pass information looked up by the ZkEventThread
into a ControllerEvent which gets processed potentially much later. For instance in BrokerChangeListener:
{code}
class BrokerChangeListener(controller: KafkaController) extends IZkChildListener with Logging
{
  override def handleChildChange(parentPath: String, currentChilds: java.util.List[String]):
Unit = {
    import JavaConverters._
    controller.addToControllerEventQueue(controller.BrokerChange(currentChilds.asScala))
  }
}
{code}

In terms of impact, this:
- increases the odds of working with stale information by the time the ControllerEvent gets
processed.
- can cause the cluster to take a long time to stabilize if you bring up many brokers simultaneously.

In terms of how to solve it:
- (short term) just ignore the ZkClient's information lookup and repeat the lookup at the
start of the ControllerEvent. This is the approach taken in this ticket.
- (long term) try to remove a queue. This basically means getting rid of ZkClient. This is
likely the approach that will be taken in KAFKA-5501.


> read current brokers from zookeeper upon processing broker change
> -----------------------------------------------------------------
>
>                 Key: KAFKA-5502
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5502
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: Onur Karaman
>            Assignee: Onur Karaman
>
> [~lindong]'s testing of the 0.11.0 release revealed a controller-side performance regression
in clusters with many brokers and many partitions when bringing up many brokers simultaneously.
> The regression is caused by KAFKA-5028: a Watcher receives WatchedEvent notifications
from the raw ZooKeeper client EventThread. A WatchedEvent only contains the following information:
> - KeeperState
> - EventType
> - path
> Note that it does not actually contain the current data or current set of children associated
with the data/child change notification. It is up to the user to do this lookup to see the
current data or set of children.
> ZkClient is itself a Watcher. When it receives a WatchedEvent, it puts a ZkEvent into
its own queue which its own ZkEventThread processes. Users of ZkClient interact with these
notifications through listeners (IZkDataListener, IZkChildListener). IZkDataListener actually
expects as input the current data of the watched znode, and likewise IZkChildListener actually
expects as input the current set of children of the watched znode. In order to provide this
information to the listeners, the ZkEventThread, when processing the ZkEvent in its queue,
looks up the information (either the current data or current set of children) simultaneously
sets up the next watch, and passes the result to the listener.
> The regression introduced in KAFKA-5028 is the time at which we lookup the information
needed for the event processing.
> In the past, the lookup from the ZkEventThread during ZkEvent processing would be passed
into the listener which is processed immediately after. For instance in ZkClient.fireChildChangedEvents:
> {code}
> List<String> children = getChildren(path);
> listener.handleChildChange(path, children);
> {code}
> Now, however, there are multiple listeners that pass information looked up by the ZkEventThread
into a ControllerEvent which gets processed potentially much later. For instance in BrokerChangeListener:
> {code}
> class BrokerChangeListener(controller: KafkaController) extends IZkChildListener with
Logging {
>   override def handleChildChange(parentPath: String, currentChilds: java.util.List[String]):
Unit = {
>     import JavaConverters._
>     controller.addToControllerEventQueue(controller.BrokerChange(currentChilds.asScala))
>   }
> }
> {code}
> In terms of impact, this:
> - increases the odds of working with stale information by the time the ControllerEvent
gets processed.
> - can cause the cluster to take a long time to stabilize if you bring up many brokers
simultaneously.
> In terms of how to solve it:
> - (short term) just ignore the ZkClient's information lookup and repeat the lookup at
the start of the ControllerEvent. This increases reads from 1 read per change to 2 reads per
change. This is the approach taken in this ticket.
> - (long term) try to remove a queue. This basically means getting rid of ZkClient. This
is likely the approach that will be taken in KAFKA-5501. Note that with KAFKA-5501, we can
revert this short term fix so that we reduce the reads from 2 reads per change back down to
1 read per change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message