kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish Singh <asi...@cloudera.com>
Subject Re: [DISCUSS] KIP-37 - Add namespaces in Kafka
Date Wed, 21 Oct 2015 19:43:11 GMT
In last KIP hangout following questions were raised.

   1.

   *Whether or not to support move command? If yes, how do we support it.*
   I think *move* command will be essential, once we start supporting
   directories. However, implementation might be a bit convoluted. A few
   things required for it will be, ability to mark a topic unavailable during
   the move, update brokers’ metadata cache to reflect the move.
   2.

   *How will acls/ configs inheritance work?*
   Say we have /dc/ns/topic.
   dc has dc_acl and dc_config. Similarly for ns and topic.
   For being able to perform an action on /dc/ns/topic, the user must have
   required perms on dc, ns and topic for that operation. For example, User1
   will need DESCRIBE permissions on dc, ns and topic to be able to describe
   /dc/ns/topic.
   For configs, configs for /dc/ns/topic will be topic_config + ns_config +
   dc_config, in that order. So, if a config is specified for topic then that
   will be used, else it’s parent (ns) will be checked for that config, and
   this goes on.
   3.

   *Will supporting n-deep hierarchy be a concern?*
   This can be a performance concern, however it sounds more of a misusage
   of the functionality or bad organization of topics. We can have a depth
   limit, but I am not sure if it is required.
   4.

   *Will we continue to support multi-directory on disk, that was proposed
   in KAFKA-188?*
   Yes, we should be able to support that. It is within those directories,
   namespaces will be created. The heuristics for choosing least loaded
   disc/dir will remain same.
   5.

   *Will it be required to move existing topics from default directory/
   namespace to a particular directory/ namespace to enable mirror-maker
   replicate topics in that directory/namespace?*
   I do not think it will be required, as one can simple add /*/* to
   mirror-maker’s blacklist and this will only capture topics that exist in
   default namespace. @Joel, does this answer your question?

​

On Fri, Oct 16, 2015 at 6:33 PM, Ashish Singh <asingh@cloudera.com> wrote:

> On Thu, Oct 15, 2015 at 1:30 PM, Jiangjie Qin <jqin@linkedin.com.invalid>
> wrote:
>
>> Hey Jay,
>>
>> If we allow consumer to subscribe to /*/my-event, does that mean we allow
>> consumer to consume cross namespaces?
>
> That is the idea. If a user has permissions then yes, he should be able to
> consume from as many namespaces as he wants.
>
>
>> In that case it seems not
>> "hierarchical" but more like a name field filtering. i.e. user can choose
>> to consume from topic where datacenter={x,y},
>> topic_name={my-topic1,mytopic2}. Am I understanding right?
>>
> I think it is still hierarchical, however with possible filtering (as you
> said).
>
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Wed, Oct 14, 2015 at 12:49 PM, Jay Kreps <jay@confluent.io> wrote:
>>
>> > Hey Jason,
>> >
>> > I actually think this is one of the advantages. The problem we have
>> today
>> > is that you can't really do bidirectional replication between clusters
>> > because it would actually be a feedback loop.
>> >
>> > So the intended use would be that you would have a structure where the
>> > top-level directory was DIFFERENT but the topic names were the same, so
>> if
>> > you maintain
>> >   /chicago-datacenter/actual-topics
>> >   /oregon-datacenter/actual topics
>> >   etc.
>> > Then you replicate
>> >   /chicago-datacenter/* => /oregon-datacenter
>> > and
>> >   /oregon-datacenter/* => /chicago-datacenter
>> >
>> > People who want the aggregate feed subscribe to /*/my-event.
>> >
>> > The nice thing about this is it gives a unified namespace across all
>> > locations.
>> >
>> > Basically exactly what we do now but you no longer need to add new
>> clusters
>> > to get the namespacing.
>> >
>> > -Jay
>> >
>> >
>> > On Wed, Oct 14, 2015 at 11:24 AM, Jason Gustafson <jason@confluent.io>
>> > wrote:
>> >
>> > > Hey Ashish, thanks for the write-up. I think having a namespace
>> > capability
>> > > is a useful feature for Kafka, in particular with the addition of the
>> > > authorization layer. I probably prefer Jay's hierarchical approach if
>> > we're
>> > > going to embed the namespace in the topic name since it seems more
>> > general.
>> > > That said, one advantage of having a namespace independent of the
>> topic
>> > > name is that it simplifies replication between namespaces a bit since
>> you
>> > > don't have to parse and rewrite topic names. Assuming that
>> hierarchical
>> > > topics will happen eventually anyway, I imagine a common pattern
>> would be
>> > > to preserve the same directory structure in multiple namespaces, so
>> > having
>> > > an easy mechanism for applications to switch between them would be
>> nice.
>> > > The namespace is kind of analogous to a chroot in this case. Of course
>> > you
>> > > can achieve the same thing by having a configurable topic prefix, just
>> > you
>> > > have to do all the topic rewriting, which I'm guessing will be a
>> little
>> > > annoying to implement in all of the clients and tools. However, the
>> > > tradeoff (as you mention in the KIP) is that all request schemas have
>> to
>> > be
>> > > updated, which is also annoying.
>> > >
>> > > -Jason
>> > >
>> > > On Wed, Oct 14, 2015 at 12:03 AM, Ashish Singh <asingh@cloudera.com>
>> > > wrote:
>> > >
>> > > > On Mon, Oct 12, 2015 at 7:37 PM, Gwen Shapira <gwen@confluent.io>
>> > wrote:
>> > > >
>> > > > > This works really nicely from the consumer side, but what about
>> the
>> > > > > producer? If there are no more topics,do we allow producing to
a
>> > > > directory
>> > > > > and have the Partitioner hash-partition messages between all
>> > partitions
>> > > > in
>> > > > > the multiple levels in a directory?
>> > > > >
>> > > > Good point.
>> > > >
>> > > > I am personally in favor of maintaining current behavior for
>> producer,
>> > > > i.e., letting users to only produce to a topic. This is different
>> for
>> > > > consumers, the suggested behavior is inline with current behavior.
>> One
>> > > can
>> > > > use regex subscription to achieve the same even today.
>> > > >
>> > > > >
>> > > > > Also, I think we want to preserve the consumer terminology of
>> > > "subscribe"
>> > > > > to topics / directories, but "assign" partitions - since the
>> consumer
>> > > > > behavior is different in those cases.
>> > > > >
>> > > > > On Mon, Oct 12, 2015 at 7:16 PM, Jay Kreps <jay@confluent.io>
>> wrote:
>> > > > >
>> > > > > > Okay this is similar to what I think we have talked about
>> before.
>> > Let
>> > > > me
>> > > > > > elaborate on the idea that I think has been floating
>> around--it's
>> > > > pretty
>> > > > > > similar with a few differences.
>> > > > > >
>> > > > > > I think what you are calling the "default namespace" is
>> basically
>> > > what
>> > > > I
>> > > > > > would call the "current working directory" with paths not
>> beginning
>> > > > with
>> > > > > > '/' being interpreted relative to this directory as in the
fs.
>> > > > > >
>> > > > > > One thing you have to work out is what levels in this hierarchy
>> you
>> > > can
>> > > > > > actually subscribe to. I think you are assuming only what
we
>> > > currently
>> > > > > > consider a "topic", i.e. the first level of directories
but not
>> the
>> > > > > > partitions or parent dirs, would be subscribable. If you
think
>> > about
>> > > > it,
>> > > > > > though, that constraint is a bit arbitrary.
>> > > > > >
>> > > > > > I'd propose instead the semantics that:
>> > > > > > - Subscribing to /a/b/c/0 means subscribing to the 0th
>> partition of
>> > > > topic
>> > > > > > "c" in directory /a/b
>> > > > > > - Subscribing to /a/b/c means subscribing to all partitions
in
>> > > > > > topic/directory "c"
>> > > > > > - Subscribing to /a/b means subscribing to all partitions
in all
>> > > > > > topics/subdirectories under a/b recursively
>> > > > > >
>> > > > > > Effectively the concept of topics goes away entirely--you
just
>> have
>> > > > > > partitions/logs and directories. In this respect rather
than
>> adding
>> > > new
>> > > > > > concepts this new feature would actually just generalizes
what
>> we
>> > > have
>> > > > > > (which I think is a good thing).
>> > > > > >
>> > > > > > -Jay
>> > > > > >
>> > > > > > On Mon, Oct 12, 2015 at 6:24 PM, Ashish Singh <
>> asingh@cloudera.com
>> > >
>> > > > > wrote:
>> > > > > >
>> > > > > > > On Mon, Oct 12, 2015 at 5:42 PM, Jay Kreps <jay@confluent.io>
>> > > wrote:
>> > > > > > >
>> > > > > > > > Great. I definitely would strongly favor carrying
over
>> user's
>> > > > > intuition
>> > > > > > > > from FS unless we think we need a very different
model. The
>> > minor
>> > > > > > details
>> > > > > > > > like the seperator and namespace term will help
with that.
>> > > > > > > >
>> > > > > > > > Follow-up question, say I have a layout like
>> > > > > > > >    /chicago-datacenter/user-events/pageviews
>> > > > > > > > Can I subscribe to
>> > > > > > > >    /chicago-datacenter/user-events
>> > > > > > > >
>> > > > > > > Yes, however they will have need a regex like
>> > > > > > > /chicago-datacenter/user-events/*
>> > > > > > >
>> > > > > > > > to get the full firehose of user events from chicago?
Can I
>> > > > subscribe
>> > > > > > to
>> > > > > > > >    /*/user-events
>> > > > > > > > to get user events originating from all datacenters?
>> > > > > > > >
>> > > > > > > Yes, however they will have need a regex like
>> > > > > > > /chicago-datacenter/user-events/*
>> > > > > > > Yes
>> > > > > > >
>> > > > > > > >
>> > > > > > > > (Assuming, for now, that these are all in the
same
>> cluster...)
>> > > > > > > >
>> > > > > > > > Also, just to confirm, it sounds from the proposal
like
>> config
>> > > > > > overrides
>> > > > > > > > would become fully hierarchical so you can override
config
>> at
>> > any
>> > > > > > > directory
>> > > > > > > > point. This will add complexity in implementation
but I
>> think
>> > > will
>> > > > > > likely
>> > > > > > > > be much more operator friendly.
>> > > > > > > >
>> > > > > > > Yes, that is the idea.
>> > > > > > >
>> > > > > > > >
>> > > > > > > > There are about a thousand details to discuss
in terms of
>> how
>> > > this
>> > > > > > would
>> > > > > > > > impact the metadata request, various zk entries,
and various
>> > > other
>> > > > > > > aspects,
>> > > > > > > > but probably it makes sense to first agree on
how we would
>> want
>> > > it
>> > > > to
>> > > > > > > work
>> > > > > > > > and then start to dive into how to implement that.
>> > > > > > > >
>> > > > > > > Agreed.
>> > > > > > >
>> > > > > > > >
>> > > > > > > > -Jay
>> > > > > > > >
>> > > > > > > > On Mon, Oct 12, 2015 at 5:28 PM, Ashish Singh
<
>> > > asingh@cloudera.com
>> > > > >
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Hey Jay, thanks for reviewing the proposal.
Answers
>> inline.
>> > > > > > > > >
>> > > > > > > > > On Mon, Oct 12, 2015 at 10:53 AM, Jay Kreps
<
>> > jay@confluent.io>
>> > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Hey guys,
>> > > > > > > > > >
>> > > > > > > > > > I think this is an important feature
and one we've
>> talked
>> > > about
>> > > > > > for a
>> > > > > > > > > > while. I really think trying to invent
a new
>> nomenclature
>> > is
>> > > > > going
>> > > > > > to
>> > > > > > > > > make
>> > > > > > > > > > it hard for people to understand, though.
As such I
>> > recommend
>> > > > we
>> > > > > > call
>> > > > > > > > > > namespaces "directories" and denote
them with '/'--this
>> > will
>> > > > make
>> > > > > > the
>> > > > > > > > > > feature 1000x more understandable to
people.
>> > > > > > > > >
>> > > > > > > > > Essentially you are suggesting two things
here.
>> > > > > > > > > 1. Use "Directory" instead of "Namespace"
as it is more
>> > > > intuitive.
>> > > > > I
>> > > > > > > > agree.
>> > > > > > > > > 2. Make '/' as delimiter instead of ':'.
Fine with me and
>> I
>> > > agree
>> > > > > if
>> > > > > > we
>> > > > > > > > > call these directories, '/' is the way to
go.
>> > > > > > > > >
>> > > > > > > > > I think we should inheret the
>> > > > > > > > > > semantics of normal unix fs in so far
as it makes sense.
>> > > > > > > > > >
>> > > > > > > > > > In this approach we get rid of topics
entirely, instead
>> we
>> > > > really
>> > > > > > > just
>> > > > > > > > > have
>> > > > > > > > > > partitions which are the equivalent
of a file and retain
>> > > their
>> > > > > > > numeric
>> > > > > > > > > > names, and the existing topic concept
is just the first
>> > > > directory
>> > > > > > > level
>> > > > > > > > > but
>> > > > > > > > > > we generalize to allow arbitrarily many
more levels of
>> > > nesting.
>> > > > > > This
>> > > > > > > > > allows
>> > > > > > > > > > categorization of data, such as
>> > > > > > /datacenter1/user-events/page-views/3
>> > > > > > > > and
>> > > > > > > > > > you can subscribe, apply configs or
permissions at any
>> > level
>> > > of
>> > > > > the
>> > > > > > > > > > hierarchy.
>> > > > > > > > > >
>> > > > > > > > > +1. This actually requires just a minor change
to existing
>> > > > > proposal,
>> > > > > > > > i.e.,
>> > > > > > > > > "some:namespace:topic" becomes "some/namespace/topic".
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > I'm actually not 100% such what the
semantics of
>> accessing
>> > > data
>> > > > > in
>> > > > > > > > > > differing namespaces is in the current
proposal, maybe
>> you
>> > > can
>> > > > > > > clarify
>> > > > > > > > > > Ashish?
>> > > > > > > > >
>> > > > > > > > > I will add more info to KIP on this, however
I think a
>> client
>> > > > > should
>> > > > > > be
>> > > > > > > > > able to access data in any namespace as long
as following
>> > > > > conditions
>> > > > > > > are
>> > > > > > > > > satisfied.
>> > > > > > > > >
>> > > > > > > > > 1. Namespace, the client is trying to access,
exists.
>> > > > > > > > > 2. The client has sufficient permissions
on the namespace
>> for
>> > > > type
>> > > > > of
>> > > > > > > > > operation the client is trying to perform
on a topic
>> within
>> > > that
>> > > > > > > > namespace.
>> > > > > > > > > 3. The client has sufficient permissions
on the topic for
>> > type
>> > > of
>> > > > > > > > operation
>> > > > > > > > > the client is trying to perform on that topic.
>> > > > > > > > >
>> > > > > > > > > If we choose to go with what you suggested
earlier that
>> just
>> > > have
>> > > > > > > > hierarchy
>> > > > > > > > > of directories, then step 3 will actually
be covered in
>> step
>> > 2.
>> > > > > > > > >
>> > > > > > > > > In the current proposal, consumers will subscribe
to a
>> topic
>> > > in a
>> > > > > > > > namespace
>> > > > > > > > > by specifying <namespace>:<topic>
as the topic name. They
>> can
>> > > > > > subscribe
>> > > > > > > > to
>> > > > > > > > > topics from multiple namespaces.
>> > > > > > > > >
>> > > > > > > > > Let me know if I totally missed your question.
>> > > > > > > > >
>> > > > > > > > > Since the point of Kafka is sharing data
I think it is
>> really
>> > > > > > > > > > important that the grouping be just
for
>> > > > > > > > > convenience/permissions/config/etc
>> > > > > > > > > > and that it remain possible to access
multiple
>> > > > > > directories/namespaces
>> > > > > > > > > from
>> > > > > > > > > > the same client.
>> > > > > > > > > >
>> > > > > > > > > Totally agree with you.
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > -Jay
>> > > > > > > > > >
>> > > > > > > > > > On Fri, Oct 9, 2015 at 6:32 PM, Ashish
Singh <
>> > > > > asingh@cloudera.com>
>> > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Hey Guys,
>> > > > > > > > > > >
>> > > > > > > > > > > I just created KIP-37 for adding
namespaces to Kafka.
>> > > > > > > > > > >
>> > > > > > > > > > > KIP-37
>> > > > > > > > > > > <
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-37+-+Add+Namespaces+to+Kafka
>> > > > > > > > > > > >
>> > > > > > > > > > > tracks the proposal.
>> > > > > > > > > > >
>> > > > > > > > > > > The idea is to make Kafka support
multi-tenancy via
>> > > > namespaces.
>> > > > > > > > > > >
>> > > > > > > > > > > Feedback and comments are welcome.
>> > > > > > > > > > > ​
>> > > > > > > > > > > --
>> > > > > > > > > > >
>> > > > > > > > > > > Regards,
>> > > > > > > > > > > Ashish
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > >
>> > > > > > > > > Regards,
>> > > > > > > > > Ashish
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > >
>> > > > > > > Regards,
>> > > > > > > Ashish
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > >
>> > > > Regards,
>> > > > Ashish
>> > > >
>> > >
>> >
>>
>
>
>
> --
>
> Regards,
> Ashish
>



-- 

Regards,
Ashish

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message