kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ismael Juma <ism...@juma.me.uk>
Subject Re: [DISCUSS] KIP-179: Change ReassignPartitionsCommand to use AdminClient
Date Tue, 01 Aug 2017 13:25:06 GMT
Hi Tom,

A high-level point for discussion before going into the details. The
proposed protocol API `alterTopics` has 2 types of operations:

1. Operations that cause data movement (or deletion): increase/decrease of
replication factor and partition reassignment. These are currently done by
`kafka-reassign-partitions.sh` and having a way to check the progress is
useful.

2. Operations that don't involve data movement: increase the number of
partitions and update topic configs (this is not in the proposal, but could
be). These are currently done by `kafka-topics.sh` and they don't usually
take very long so progress reporting is less important.

It would be worth thinking whether having 2 separate protocol APIs would be
better. I can see pros and cons, so I'd be interested in what you and
others think.

Ismael

On Tue, Aug 1, 2017 at 10:19 AM, Tom Bentley <t.j.bentley@gmail.com> wrote:

> I have added the timeout I mentioned before, because it makes the
> implementation of topic alteration more symmetric with the topic creation
> APIs.
>
> I have also added a section ("Policy") on retrofitting the
> CreateTopicPolicy's rules to topic alteration requests, and made a few
> other minor fixes.
>
> I've still not factored out the topic name from the ReplicaStatusRequest
> (Ismael, do you have an opinion about that?)
>
> If any one else has some feedback on this KIP that would be great.
> Otherwise, I would like to start a vote on this KIP before the end of the
> week.
>
> Thanks,
>
> Tom
>
> On 26 July 2017 at 11:45, Tom Bentley <t.j.bentley@gmail.com> wrote:
>
> > I've updated the KIP to fix those niggles, but I've not factored out the
> > topic name from the ReplicaStatusRequest, yet.
> >
> > Looking at the topic creation APIs in more detail, the
> CreateTopicsOptions
> > has
> >
> > * `shouldValidateOnly()`, which would make a lot of sense for the alter
> > topic APIs
> > * `timeoutMs()`, which I'm not sure sure about...
> >
> > Topic creation doesn't require shifting replicas between brokers so it's
> > reasonable to support timeout, because we don't expect it to take very
> long.
> >
> > Topic alteration usually takes a while because we are going to have to
> > move replicas. Since we're adding a whole API to track the progress of
> that
> > replication, I'm inclined to think that having a timeout is a bit
> pointless.
> >
> > But should the replicaStatus() API have a timeout? I suppose it probably
> > should.
> >
> >
> > On 26 July 2017 at 10:58, Tom Bentley <t.j.bentley@gmail.com> wrote:
> >
> >> Thanks Paolo,
> >>
> >>   *   in the "Public Interfaces" section you wrote
> >>> alterTopics(Set<AlterTopics>) but then a collection is used (instead
> of a
> >>> set) in the Proposed Changes section. I'm ok with collection.
> >>>
> >>
> >> Agree it should be Collection.
> >>
> >>
> >>>   *   in the summary of the alterTopics method you say "The request can
> >>> change the number of partitions, replication factor and/or the
> partition
> >>> assignments." I think that the "and/or" is misleading (at least for
> me).
> >>> For the TopicCommand tool you can specify partitions AND replication
> factor
> >>> ... OR partition assignments (but not for example partitions AND
> >>> replication factor AND partition assignments). Maybe it could be "The
> >>> request can change the number of partitions and the related replication
> >>> factor or specifying a partition assignments."
> >>>
> >>
> >> Is there a reason why we can't support changing all three at once? It
> >> certainly makes conceptual sense to, say, increase the number of
> partitions
> >> and replication factor, and specify how you want the partitions
> assigned.
> >> And doing two separate calls would be less efficient as we sync new
> >> replicas on brokers only to then move them somewhere else.
> >>
> >> If there is a reason we don't want to support changing all three, then
> we
> >> can return the error INVALID_REQUEST (42). That would allow us to
> support
> >> changing all three at some time in the future, without having to change
> the
> >> API.
> >>
> >>
> >>>   *   I know that it would be a breaking change in the Admin Client API
> >>> but why having an AlteredTopic class which is quite similar to the
> already
> >>> existing NewTopic class ? I know that using NewTopic for the alter
> method
> >>> could be misleading. What about renaming NewTopic in something like
> >>> AdminTopic ? At same time it means that the TopicDetails class (which
> you
> >>> can get from the current NewTopic) should be outside the
> >>> CreateTopicsRequest because it could be used in the AlterTopicsRequest
> as
> >>> well.
> >>>
> >>
> >> One problem with this is it tends to inhibit future API changes for
> >> either newTopics() or alterTopics(), because any common class needs to
> make
> >> sense in both contexts.
> >>
> >> For createTopics() we get to specify some configs (the
> >> Map<String,String>), but since the AdminClient already has
> alterConfigs()
> >> for changing topic configs after topic creation I don't think it's
> right to
> >> also support changing those configs via alterTopics() as well. But
> having
> >> them in a common AdminTopic class would suggest that that was supported.
> >> Yes, alterTopics could return INVALID_REQUEST if it was given topic
> >> configs, but this is just making the API harder to use since it is
> >> weakening of the type safety of the API.
> >>
> >> I suppose we *could* write a common TopicDetails class and make the
> >> existing nested one extend the common one, with deprecations, to
> eventually
> >> remove the nested one.
> >>
> >>
> >>
> >>>   *   A typo in the ReplicaStatus : gpartition() instead of partition()
> >>>   *   In the AlterTopicRequets
> >>>      *   the replication factor should be INT16
> >>>
> >>
> >> Ah, thanks!
> >>
> >>
> >>>      *   I would use same fields name as CreateTopicsRequest (they are
> >>> quite similar)
> >>>   *   What's the broker id in the ReplicaStatusRequest ?
> >>>
> >>
> >> It's the broker, which is expected to have a replica of the given
> >> partition, that we're querying the status of. It is necessary because
> we're
> >> asking the _leader_ for the partition about (a subset of) the status of
> the
> >> followers. Put another way, to identify the replica of a particular
> >> partition on a particular broker we need the tuple (topic, partition,
> >> broker).
> >>
> >> If we were always interested in the status of the partition across all
> >> brokers we could omit the broker part. But for reassignment we actually
> >> only care about a subset of the brokers.
> >>
> >>
> >>>   *   Thinking aloud, could make sense having "Partition" in the
> >>> ReplicaStatusRequest as an array so that I can specify in only one
> request
> >>> the status for partitions I'm interested in, in order to avoid e
> request
> >>> for each partition for the same topic. Maybe empty array could mean ..
> >>> "give me status for all partitions of this topic". Of course it means
> that
> >>> the ReplicaStatusResponse should change because we should have an array
> >>> with partition, broker, lag and so on
> >>>
> >>
> >> You already can specify in one request the status for all the partitions
> >> you're interested in (ReplicaStatus can be repeated/is an array field).
> >>
> >> We could factor out the topic to avoid repeating it, which would be more
> >> efficient when we're querying the status of many partitions of a topic
> >> and/or there are many brokers holding replicas. In other words, we could
> >> factor it to look like this:
> >>
> >> ReplicaStatusRequest => [TopicReplicas]
> >>   TopicReplicas => Topic [PartitionReplica]
> >>     Topic => string
> >>     PartitionReplica => Partition Broker
> >>       Partition => int32
> >>       Broker => int32
> >>
> >> Does this make sense to you?
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message