kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Gustafson <ja...@confluent.io>
Subject Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging
Date Fri, 03 Feb 2017 20:04:01 GMT
Hey Rajini,

Thanks for the questions. Responses below:

> 1. Will the transaction coordinator check topic ACLs based on the
> requesting client's credentials? Access to transaction logs, topics being
> added for transaction etc?

Good question. I think it makes sense to check topic Write permission when
adding partitions to the transaction. I'll add this to the document.
Perhaps authorization to the transaction log itself, however, can be
assumed from having access to the ProducerTransactionalId resource? This
would be similar to how access to __consumer_offsets is assumed if the
client has access to the Group resource.

2. If I create a transactional produce request (by hand, not using the
> producer API) with a random PID (random, hence unlikely to be in use), will
> the broker append a transactional message to the logs, preventing LSO from
> moving forward? What validation will broker do for PIDs?

Yes, that is correct. Validation of the TransactionalId to PID binding is a
known gap in the current proposal, and is discussed in the design document.
Now that I'm thinking about it a bit more, I think there is a good case for
including the TransactionalId in the ProduceRequest (I think Jun suggested
this previously). Verifying it does not ensure that the included PID is
correct, but it does ensure that the client is authorized to use
transactions. If the client wanted to do an "endless transaction attack,"
having Write access to the topic and an authorized transactionalID is all
they would need anyway even if we could authorize the PID itself. This
seems like a worthwhile improvement.

For future work, my half-baked idea to authorize the PID binding is to
leverage the delegation work in KIP-48. When the PID is generated, we can
give the producer a token which is then used in produce requests (say an
hmac covering the TransactionalId and PID).

> 3. Will every broker check that a client sending transactional produce
> requests at least has write access to transaction log topic since it is not
> validating transactional.id (for every produce request)?

 4. I understand that brokers cannot authorize the transactional id for each
> produce request since requests contain only the PID. But since there is a
> one-to-one mapping between PID and transactional.id, and a connection is
> never expected to change its transactional.id, perhaps it is feasible to
> add authorization and cache the results in the Session? Perhaps not for
> version 1, but feels like it will be good to close the security gap here.
> Obviously it would be simpler if transactional.id was in the produce
> request if the overhead was acceptable.

I think my response above addresses both of these. We should include the
TransactionalId in the ProduceRequest. Of course it need not be included in
the message format, so I'm not too concerned about the additional overhead
it adds.


On Fri, Feb 3, 2017 at 6:52 AM, Ismael Juma <ismael@juma.me.uk> wrote:

> Comments inline.
> On Thu, Feb 2, 2017 at 6:28 PM, Jason Gustafson <jason@confluent.io>
> wrote:
> > Took me a while to remember why we didn't do this. The timestamp that is
> > included at the message set level is the max timestamp of all messages in
> > the message set as is the case in the current message format (I will
> update
> > the document to make this explicit). We could make the message timestamps
> > relative to the max timestamp, but that makes serialization a bit awkward
> > since the timestamps are not assumed to be increasing sequentially or
> > monotonically. Once the messages in the message set had been determined,
> we
> > would need to go back and adjust the relative timestamps.
> >
> Yes, I thought this would be a bit tricky and hence why I mentioned the
> option of adding a new field at the message set level for the first
> timestamp even though that's not ideal either.
> Here's one idea. We let the timestamps in the messages be varints, but we
> > make their values be relative to the timestamp of the previous message,
> > with the timestamp of the first message being absolute. For example, if
> we
> > had timestamps 500, 501, 499, then we would write 500 for the first
> > message, 1 for the next, and -2 for the final message. Would that work?
> Let
> > me think a bit about it and see if there are any problems.
> >
> It's an interesting idea. Comparing to the option of having the first
> timestamp in the message set, It's a little more space efficient as we
> don't have both a full timestamp in the message set _and_ a varint in the
> first message (which would always be 0, so we avoid the extra byte) and
> also the deltas could be a little smaller in the common case. The main
> downside is that it introduces a semantics inconsistency between the first
> message and the rest. Not ideal, but maybe we can live with that.
> Ismael

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message