kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Creasy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-656) Add Quotas to Kafka
Date Tue, 26 Feb 2013 06:28:12 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586842#comment-13586842

Jonathan Creasy commented on KAFKA-656:

Kafka is using the Yammer/Coda Hale Metrics library now right? 

Would it be sufficient to track the three quantities by topic and client ID and take action
if the 1/5/15-min load for that metric exceeded the thresholds defined? That is an EWMA so
it would rise and taper off over time. 

Perhaps we could use an exponential back-off so that if you exceeded it once it would recover
quickly and then after that take longer too cool-off before allowing the client again. 
> Add Quotas to Kafka
> -------------------
>                 Key: KAFKA-656
>                 URL: https://issues.apache.org/jira/browse/KAFKA-656
>             Project: Kafka
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 0.8.1
>            Reporter: Jay Kreps
>              Labels: project
> It would be nice to implement a quota system in Kafka to improve our support for highly
multi-tenant usage. The goal of this system would be to prevent one naughty user from accidently
overloading the whole cluster.
> There are several quantities we would want to track:
> 1. Requests pers second
> 2. Bytes written per second
> 3. Bytes read per second
> There are two reasonable groupings we would want to aggregate and enforce these thresholds
> 1. Topic level
> 2. Client level (e.g. by client id from the request)
> When a request hits one of these limits we will simply reject it with a QUOTA_EXCEEDED
> To avoid suddenly breaking things without warning, we should ideally support two thresholds:
a soft threshold at which we produce some kind of warning and a hard threshold at which we
give the error. The soft threshold could just be defined as 80% (or whatever) of the hard
> There are nuances to getting this right. If you measure second-by-second a single burst
may exceed the threshold, so we need a sustained measurement over a period of time.
> Likewise when do we stop giving this error? To make this work right we likely need to
charge against the quota for request *attempts* not just successful requests. Otherwise a
client that is overloading the server will just flap on and off--i.e. we would disable them
for a period of time but when we re-enabled them they would likely still be abusing us.
> It would be good to a wiki design on how this would all work as a starting point for

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message