kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish Singh <asi...@cloudera.com>
Subject Re: [DISCUSS] KIP-37 - Add namespaces in Kafka
Date Wed, 21 Oct 2015 22:47:15 GMT
On Wed, Oct 21, 2015 at 2:22 PM, Jay Kreps <jay@confluent.io> wrote:

> Gwen, It's a good question of what the producer semantics are--would we
> only allow you to produce to a partition or first level directory or would
> we hash over whatever subtree you supply? Actually not sure which makes
> more sense...
>
> Ashish, here are some thoughts:
> 1. I think we can do this online. There is a question of what happens to
> readers and writers but presumably it would the same thing as if that topic
> weren't there. There would be no guarantee this would happen atomic over
> different brokers or clients, though.
> 2. ACLs should work like unix perms, right?


Are you suggesting we should move allowed operations to R, W, X model of
unix. Currently, we support these operations
<https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/security/auth/Operation.scala#L25>
.

I think configs would overide
> hierarchically, so we would have a full set of configs for each partition
> computed by walking up the tree from the root and taking the first
> override). I think this is what you're describing, right?
>

Yes.

3. Totally agree no reason to have an arbitrary limit.
> 4. I actually don't think the physical layout on disk should be at all
> connected to the logical directory hierarchy we present.


I think it will be useful to have that connection as that will enable users
to encrypt different namespaces with different keys. Thus, one more step
towards a completely multi tenant system.


> That is, whether
> you use RAID or not shouldn't impact the location of a topic in your
> directory structure.


Even if we make physical layout on disk representative of directory
hierarchy,  I think this will not be a concern. Correct me, if I am missing
something.

Not sure if this is what you are saying or not. This
> does raise the question of how to do the disk layout. The simplest thing
> would be to keep the flat data directories but make the names of the
> partitions on disk just be logical inode numbers and then have a separate
> mapping of these inodes to logical names stored in ZK with a cache. I think
> this would make things like rename fast and atomic. The downside of this is
> that the 'ls' command will no longer tell you much about the data on a
> broker.
>

Enabling renaming of topics is definitely something that will be nice to
have, however with the flat structure we won't be able to enable encrypting
different directories/ namespaces with different keys. However, with
directory hierarchy on disk can be achieved with logical names, each dir
will need a logical name though.


> -Jay
>
> On Wed, Oct 21, 2015 at 12:43 PM, Ashish Singh <asingh@cloudera.com>
> wrote:
>
> > In last KIP hangout following questions were raised.
> >
> >    1.
> >
> >    *Whether or not to support move command? If yes, how do we support
> it.*
> >    I think *move* command will be essential, once we start supporting
> >    directories. However, implementation might be a bit convoluted. A few
> >    things required for it will be, ability to mark a topic unavailable
> > during
> >    the move, update brokers’ metadata cache to reflect the move.
> >    2.
> >
> >    *How will acls/ configs inheritance work?*
> >    Say we have /dc/ns/topic.
> >    dc has dc_acl and dc_config. Similarly for ns and topic.
> >    For being able to perform an action on /dc/ns/topic, the user must
> have
> >    required perms on dc, ns and topic for that operation. For example,
> > User1
> >    will need DESCRIBE permissions on dc, ns and topic to be able to
> > describe
> >    /dc/ns/topic.
> >    For configs, configs for /dc/ns/topic will be topic_config +
> ns_config +
> >    dc_config, in that order. So, if a config is specified for topic then
> > that
> >    will be used, else it’s parent (ns) will be checked for that config,
> and
> >    this goes on.
> >    3.
> >
> >    *Will supporting n-deep hierarchy be a concern?*
> >    This can be a performance concern, however it sounds more of a
> misusage
> >    of the functionality or bad organization of topics. We can have a
> depth
> >    limit, but I am not sure if it is required.
> >    4.
> >
> >    *Will we continue to support multi-directory on disk, that was
> proposed
> >    in KAFKA-188?*
> >    Yes, we should be able to support that. It is within those
> directories,
> >    namespaces will be created. The heuristics for choosing least loaded
> >    disc/dir will remain same.
> >    5.
> >
> >    *Will it be required to move existing topics from default directory/
> >    namespace to a particular directory/ namespace to enable mirror-maker
> >    replicate topics in that directory/namespace?*
> >    I do not think it will be required, as one can simple add /*/* to
> >    mirror-maker’s blacklist and this will only capture topics that exist
> in
> >    default namespace. @Joel, does this answer your question?
> >
> > ​
> >
> > On Fri, Oct 16, 2015 at 6:33 PM, Ashish Singh <asingh@cloudera.com>
> wrote:
> >
> > > On Thu, Oct 15, 2015 at 1:30 PM, Jiangjie Qin
> <jqin@linkedin.com.invalid
> > >
> > > wrote:
> > >
> > >> Hey Jay,
> > >>
> > >> If we allow consumer to subscribe to /*/my-event, does that mean we
> > allow
> > >> consumer to consume cross namespaces?
> > >
> > > That is the idea. If a user has permissions then yes, he should be able
> > to
> > > consume from as many namespaces as he wants.
> > >
> > >
> > >> In that case it seems not
> > >> "hierarchical" but more like a name field filtering. i.e. user can
> > choose
> > >> to consume from topic where datacenter={x,y},
> > >> topic_name={my-topic1,mytopic2}. Am I understanding right?
> > >>
> > > I think it is still hierarchical, however with possible filtering (as
> you
> > > said).
> > >
> > >>
> > >> Thanks,
> > >>
> > >> Jiangjie (Becket) Qin
> > >>
> > >> On Wed, Oct 14, 2015 at 12:49 PM, Jay Kreps <jay@confluent.io> wrote:
> > >>
> > >> > Hey Jason,
> > >> >
> > >> > I actually think this is one of the advantages. The problem we have
> > >> today
> > >> > is that you can't really do bidirectional replication between
> clusters
> > >> > because it would actually be a feedback loop.
> > >> >
> > >> > So the intended use would be that you would have a structure where
> the
> > >> > top-level directory was DIFFERENT but the topic names were the same,
> > so
> > >> if
> > >> > you maintain
> > >> >   /chicago-datacenter/actual-topics
> > >> >   /oregon-datacenter/actual topics
> > >> >   etc.
> > >> > Then you replicate
> > >> >   /chicago-datacenter/* => /oregon-datacenter
> > >> > and
> > >> >   /oregon-datacenter/* => /chicago-datacenter
> > >> >
> > >> > People who want the aggregate feed subscribe to /*/my-event.
> > >> >
> > >> > The nice thing about this is it gives a unified namespace across all
> > >> > locations.
> > >> >
> > >> > Basically exactly what we do now but you no longer need to add new
> > >> clusters
> > >> > to get the namespacing.
> > >> >
> > >> > -Jay
> > >> >
> > >> >
> > >> > On Wed, Oct 14, 2015 at 11:24 AM, Jason Gustafson <
> jason@confluent.io
> > >
> > >> > wrote:
> > >> >
> > >> > > Hey Ashish, thanks for the write-up. I think having a namespace
> > >> > capability
> > >> > > is a useful feature for Kafka, in particular with the addition
of
> > the
> > >> > > authorization layer. I probably prefer Jay's hierarchical approach
> > if
> > >> > we're
> > >> > > going to embed the namespace in the topic name since it seems
more
> > >> > general.
> > >> > > That said, one advantage of having a namespace independent of
the
> > >> topic
> > >> > > name is that it simplifies replication between namespaces a bit
> > since
> > >> you
> > >> > > don't have to parse and rewrite topic names. Assuming that
> > >> hierarchical
> > >> > > topics will happen eventually anyway, I imagine a common pattern
> > >> would be
> > >> > > to preserve the same directory structure in multiple namespaces,
> so
> > >> > having
> > >> > > an easy mechanism for applications to switch between them would
be
> > >> nice.
> > >> > > The namespace is kind of analogous to a chroot in this case.
Of
> > course
> > >> > you
> > >> > > can achieve the same thing by having a configurable topic prefix,
> > just
> > >> > you
> > >> > > have to do all the topic rewriting, which I'm guessing will be
a
> > >> little
> > >> > > annoying to implement in all of the clients and tools. However,
> the
> > >> > > tradeoff (as you mention in the KIP) is that all request schemas
> > have
> > >> to
> > >> > be
> > >> > > updated, which is also annoying.
> > >> > >
> > >> > > -Jason
> > >> > >
> > >> > > On Wed, Oct 14, 2015 at 12:03 AM, Ashish Singh <
> asingh@cloudera.com
> > >
> > >> > > wrote:
> > >> > >
> > >> > > > On Mon, Oct 12, 2015 at 7:37 PM, Gwen Shapira <
> gwen@confluent.io>
> > >> > wrote:
> > >> > > >
> > >> > > > > This works really nicely from the consumer side, but
what
> about
> > >> the
> > >> > > > > producer? If there are no more topics,do we allow producing
> to a
> > >> > > > directory
> > >> > > > > and have the Partitioner hash-partition messages between
all
> > >> > partitions
> > >> > > > in
> > >> > > > > the multiple levels in a directory?
> > >> > > > >
> > >> > > > Good point.
> > >> > > >
> > >> > > > I am personally in favor of maintaining current behavior
for
> > >> producer,
> > >> > > > i.e., letting users to only produce to a topic. This is
> different
> > >> for
> > >> > > > consumers, the suggested behavior is inline with current
> behavior.
> > >> One
> > >> > > can
> > >> > > > use regex subscription to achieve the same even today.
> > >> > > >
> > >> > > > >
> > >> > > > > Also, I think we want to preserve the consumer terminology
of
> > >> > > "subscribe"
> > >> > > > > to topics / directories, but "assign" partitions -
since the
> > >> consumer
> > >> > > > > behavior is different in those cases.
> > >> > > > >
> > >> > > > > On Mon, Oct 12, 2015 at 7:16 PM, Jay Kreps <jay@confluent.io>
> > >> wrote:
> > >> > > > >
> > >> > > > > > Okay this is similar to what I think we have talked
about
> > >> before.
> > >> > Let
> > >> > > > me
> > >> > > > > > elaborate on the idea that I think has been floating
> > >> around--it's
> > >> > > > pretty
> > >> > > > > > similar with a few differences.
> > >> > > > > >
> > >> > > > > > I think what you are calling the "default namespace"
is
> > >> basically
> > >> > > what
> > >> > > > I
> > >> > > > > > would call the "current working directory" with
paths not
> > >> beginning
> > >> > > > with
> > >> > > > > > '/' being interpreted relative to this directory
as in the
> fs.
> > >> > > > > >
> > >> > > > > > One thing you have to work out is what levels
in this
> > hierarchy
> > >> you
> > >> > > can
> > >> > > > > > actually subscribe to. I think you are assuming
only what we
> > >> > > currently
> > >> > > > > > consider a "topic", i.e. the first level of directories
but
> > not
> > >> the
> > >> > > > > > partitions or parent dirs, would be subscribable.
If you
> think
> > >> > about
> > >> > > > it,
> > >> > > > > > though, that constraint is a bit arbitrary.
> > >> > > > > >
> > >> > > > > > I'd propose instead the semantics that:
> > >> > > > > > - Subscribing to /a/b/c/0 means subscribing to
the 0th
> > >> partition of
> > >> > > > topic
> > >> > > > > > "c" in directory /a/b
> > >> > > > > > - Subscribing to /a/b/c means subscribing to all
partitions
> in
> > >> > > > > > topic/directory "c"
> > >> > > > > > - Subscribing to /a/b means subscribing to all
partitions in
> > all
> > >> > > > > > topics/subdirectories under a/b recursively
> > >> > > > > >
> > >> > > > > > Effectively the concept of topics goes away entirely--you
> just
> > >> have
> > >> > > > > > partitions/logs and directories. In this respect
rather than
> > >> adding
> > >> > > new
> > >> > > > > > concepts this new feature would actually just
generalizes
> what
> > >> we
> > >> > > have
> > >> > > > > > (which I think is a good thing).
> > >> > > > > >
> > >> > > > > > -Jay
> > >> > > > > >
> > >> > > > > > On Mon, Oct 12, 2015 at 6:24 PM, Ashish Singh
<
> > >> asingh@cloudera.com
> > >> > >
> > >> > > > > wrote:
> > >> > > > > >
> > >> > > > > > > On Mon, Oct 12, 2015 at 5:42 PM, Jay Kreps
<
> > jay@confluent.io>
> > >> > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Great. I definitely would strongly favor
carrying over
> > >> user's
> > >> > > > > intuition
> > >> > > > > > > > from FS unless we think we need a very
different model.
> > The
> > >> > minor
> > >> > > > > > details
> > >> > > > > > > > like the seperator and namespace term
will help with
> that.
> > >> > > > > > > >
> > >> > > > > > > > Follow-up question, say I have a layout
like
> > >> > > > > > > >    /chicago-datacenter/user-events/pageviews
> > >> > > > > > > > Can I subscribe to
> > >> > > > > > > >    /chicago-datacenter/user-events
> > >> > > > > > > >
> > >> > > > > > > Yes, however they will have need a regex
like
> > >> > > > > > > /chicago-datacenter/user-events/*
> > >> > > > > > >
> > >> > > > > > > > to get the full firehose of user events
from chicago?
> Can
> > I
> > >> > > > subscribe
> > >> > > > > > to
> > >> > > > > > > >    /*/user-events
> > >> > > > > > > > to get user events originating from
all datacenters?
> > >> > > > > > > >
> > >> > > > > > > Yes, however they will have need a regex
like
> > >> > > > > > > /chicago-datacenter/user-events/*
> > >> > > > > > > Yes
> > >> > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > (Assuming, for now, that these are all
in the same
> > >> cluster...)
> > >> > > > > > > >
> > >> > > > > > > > Also, just to confirm, it sounds from
the proposal like
> > >> config
> > >> > > > > > overrides
> > >> > > > > > > > would become fully hierarchical so you
can override
> config
> > >> at
> > >> > any
> > >> > > > > > > directory
> > >> > > > > > > > point. This will add complexity in implementation
but I
> > >> think
> > >> > > will
> > >> > > > > > likely
> > >> > > > > > > > be much more operator friendly.
> > >> > > > > > > >
> > >> > > > > > > Yes, that is the idea.
> > >> > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > There are about a thousand details to
discuss in terms
> of
> > >> how
> > >> > > this
> > >> > > > > > would
> > >> > > > > > > > impact the metadata request, various
zk entries, and
> > various
> > >> > > other
> > >> > > > > > > aspects,
> > >> > > > > > > > but probably it makes sense to first
agree on how we
> would
> > >> want
> > >> > > it
> > >> > > > to
> > >> > > > > > > work
> > >> > > > > > > > and then start to dive into how to implement
that.
> > >> > > > > > > >
> > >> > > > > > > Agreed.
> > >> > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > -Jay
> > >> > > > > > > >
> > >> > > > > > > > On Mon, Oct 12, 2015 at 5:28 PM, Ashish
Singh <
> > >> > > asingh@cloudera.com
> > >> > > > >
> > >> > > > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > Hey Jay, thanks for reviewing the
proposal. Answers
> > >> inline.
> > >> > > > > > > > >
> > >> > > > > > > > > On Mon, Oct 12, 2015 at 10:53 AM,
Jay Kreps <
> > >> > jay@confluent.io>
> > >> > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > > Hey guys,
> > >> > > > > > > > > >
> > >> > > > > > > > > > I think this is an important
feature and one we've
> > >> talked
> > >> > > about
> > >> > > > > > for a
> > >> > > > > > > > > > while. I really think trying
to invent a new
> > >> nomenclature
> > >> > is
> > >> > > > > going
> > >> > > > > > to
> > >> > > > > > > > > make
> > >> > > > > > > > > > it hard for people to understand,
though. As such I
> > >> > recommend
> > >> > > > we
> > >> > > > > > call
> > >> > > > > > > > > > namespaces "directories" and
denote them with
> > '/'--this
> > >> > will
> > >> > > > make
> > >> > > > > > the
> > >> > > > > > > > > > feature 1000x more understandable
to people.
> > >> > > > > > > > >
> > >> > > > > > > > > Essentially you are suggesting
two things here.
> > >> > > > > > > > > 1. Use "Directory" instead of "Namespace"
as it is
> more
> > >> > > > intuitive.
> > >> > > > > I
> > >> > > > > > > > agree.
> > >> > > > > > > > > 2. Make '/' as delimiter instead
of ':'. Fine with me
> > and
> > >> I
> > >> > > agree
> > >> > > > > if
> > >> > > > > > we
> > >> > > > > > > > > call these directories, '/' is
the way to go.
> > >> > > > > > > > >
> > >> > > > > > > > > I think we should inheret the
> > >> > > > > > > > > > semantics of normal unix fs
in so far as it makes
> > sense.
> > >> > > > > > > > > >
> > >> > > > > > > > > > In this approach we get rid
of topics entirely,
> > instead
> > >> we
> > >> > > > really
> > >> > > > > > > just
> > >> > > > > > > > > have
> > >> > > > > > > > > > partitions which are the equivalent
of a file and
> > retain
> > >> > > their
> > >> > > > > > > numeric
> > >> > > > > > > > > > names, and the existing topic
concept is just the
> > first
> > >> > > > directory
> > >> > > > > > > level
> > >> > > > > > > > > but
> > >> > > > > > > > > > we generalize to allow arbitrarily
many more levels
> of
> > >> > > nesting.
> > >> > > > > > This
> > >> > > > > > > > > allows
> > >> > > > > > > > > > categorization of data, such
as
> > >> > > > > > /datacenter1/user-events/page-views/3
> > >> > > > > > > > and
> > >> > > > > > > > > > you can subscribe, apply configs
or permissions at
> any
> > >> > level
> > >> > > of
> > >> > > > > the
> > >> > > > > > > > > > hierarchy.
> > >> > > > > > > > > >
> > >> > > > > > > > > +1. This actually requires just
a minor change to
> > existing
> > >> > > > > proposal,
> > >> > > > > > > > i.e.,
> > >> > > > > > > > > "some:namespace:topic" becomes
"some/namespace/topic".
> > >> > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > I'm actually not 100% such
what the semantics of
> > >> accessing
> > >> > > data
> > >> > > > > in
> > >> > > > > > > > > > differing namespaces is in
the current proposal,
> maybe
> > >> you
> > >> > > can
> > >> > > > > > > clarify
> > >> > > > > > > > > > Ashish?
> > >> > > > > > > > >
> > >> > > > > > > > > I will add more info to KIP on
this, however I think a
> > >> client
> > >> > > > > should
> > >> > > > > > be
> > >> > > > > > > > > able to access data in any namespace
as long as
> > following
> > >> > > > > conditions
> > >> > > > > > > are
> > >> > > > > > > > > satisfied.
> > >> > > > > > > > >
> > >> > > > > > > > > 1. Namespace, the client is trying
to access, exists.
> > >> > > > > > > > > 2. The client has sufficient permissions
on the
> > namespace
> > >> for
> > >> > > > type
> > >> > > > > of
> > >> > > > > > > > > operation the client is trying
to perform on a topic
> > >> within
> > >> > > that
> > >> > > > > > > > namespace.
> > >> > > > > > > > > 3. The client has sufficient permissions
on the topic
> > for
> > >> > type
> > >> > > of
> > >> > > > > > > > operation
> > >> > > > > > > > > the client is trying to perform
on that topic.
> > >> > > > > > > > >
> > >> > > > > > > > > If we choose to go with what you
suggested earlier
> that
> > >> just
> > >> > > have
> > >> > > > > > > > hierarchy
> > >> > > > > > > > > of directories, then step 3 will
actually be covered
> in
> > >> step
> > >> > 2.
> > >> > > > > > > > >
> > >> > > > > > > > > In the current proposal, consumers
will subscribe to a
> > >> topic
> > >> > > in a
> > >> > > > > > > > namespace
> > >> > > > > > > > > by specifying <namespace>:<topic>
as the topic name.
> > They
> > >> can
> > >> > > > > > subscribe
> > >> > > > > > > > to
> > >> > > > > > > > > topics from multiple namespaces.
> > >> > > > > > > > >
> > >> > > > > > > > > Let me know if I totally missed
your question.
> > >> > > > > > > > >
> > >> > > > > > > > > Since the point of Kafka is sharing
data I think it is
> > >> really
> > >> > > > > > > > > > important that the grouping
be just for
> > >> > > > > > > > > convenience/permissions/config/etc
> > >> > > > > > > > > > and that it remain possible
to access multiple
> > >> > > > > > directories/namespaces
> > >> > > > > > > > > from
> > >> > > > > > > > > > the same client.
> > >> > > > > > > > > >
> > >> > > > > > > > > Totally agree with you.
> > >> > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > -Jay
> > >> > > > > > > > > >
> > >> > > > > > > > > > On Fri, Oct 9, 2015 at 6:32
PM, Ashish Singh <
> > >> > > > > asingh@cloudera.com>
> > >> > > > > > > > > wrote:
> > >> > > > > > > > > >
> > >> > > > > > > > > > > Hey Guys,
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > I just created KIP-37
for adding namespaces to
> > Kafka.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > KIP-37
> > >> > > > > > > > > > > <
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-37+-+Add+Namespaces+to+Kafka
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > tracks the proposal.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > The idea is to make Kafka
support multi-tenancy
> via
> > >> > > > namespaces.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Feedback and comments
are welcome.
> > >> > > > > > > > > > > ​
> > >> > > > > > > > > > > --
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Regards,
> > >> > > > > > > > > > > Ashish
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > --
> > >> > > > > > > > >
> > >> > > > > > > > > Regards,
> > >> > > > > > > > > Ashish
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > --
> > >> > > > > > >
> > >> > > > > > > Regards,
> > >> > > > > > > Ashish
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > >
> > >> > > > Regards,
> > >> > > > Ashish
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > >
> > > Regards,
> > > Ashish
> > >
> >
> >
> >
> > --
> >
> > Regards,
> > Ashish
> >
>



-- 

Regards,
Ashish

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message