kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sönke Liebau <soenke.lie...@opencore.com.INVALID>
Subject [DISCUSS] Improving ACLs by allowing ip ranges and subnet expressions?
Date Wed, 24 Jan 2018 20:59:17 GMT
Hi everyone,

the current ACL functionality in Kafka is a bit limited concerning
host based rules when specifying multiple hosts. A common scenario for
this would be that if have a YARN cluster running Spark jobs that
access Kafka and want to create ACLs based on the ip addresses of the
cluster nodes.
Currently kafka-acls only allows to specify individual ips, so this
would look like

./kafka-acls --add --producer \
--topic test --authorizer-properties zookeeper.connect=localhost:2181 \
--allow-principal User:spark \
--allow-host \
--allow-host \
--allow-host ...

which can get unwieldy if you have a 200 node cluster. Internally this
command would not create a single ACL with multiple host entries, but
rather one ACL per host that is specified on the command line, which
makes the ACL listing a bit confusing.

There are currently a few jiras in various states around this topic:
KAFKA-3531 [1], KAFKA-4759 [2], KAFKA-4985 [3] & KAFKA-5713 [4]

KAFKA-4759 has a patch available, but would currently only add
interpretation of CIDR notation, no specific ranges, which I think
could easily be added.

Colin McCabe commented in KAFKA-4985 that so far this was not
implemented as no standard for expressing ip ranges with a fast
implementation had been found so far, the available patch uses the
ipmath [5] package for parsing expressions and range checking - which
seems fairly small and focused.

This would allow for expressions of the following type:

I'd suggest extending this a little to allow a semicolon separated
list of values:;;

Performance considerations
Internally the ipmath package represents ip addresses as longs, so if
we stick with the example of a 200 node cluster from above, with the
current implementation that would be 200 string comparisons for every
request, whereas with a range it could potentially come down to two
long comparisons. This is of course a back-of-the-envelope calculation
at best, but there at least seems to be a case for investigating this
a bit further I think.

These changes would probably necessitate a KIP - though with some
consideration they could be made in a way that no existing public
facing functionality is changed, but for transparency and proper
documentation I'd say a KIP would be preferable.

I'd be happy to draft one if people think this is worthwhile.

Let me know what you think.

best regards,

[1] https://issues.apache.org/jira/browse/KAFKA-3531
[2] https://issues.apache.org/jira/browse/KAFKA-4759
[3] https://issues.apache.org/jira/browse/KAFKA-4985
[4] https://issues.apache.org/jira/browse/KAFKA-5713
[5] https://github.com/jgonian/commons-ip-math

View raw message