kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajini Sivaram <rajinisiva...@gmail.com>
Subject Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging
Date Fri, 03 Feb 2017 21:52:14 GMT
Hi Jason,

Thank you for the responses. Agree that authorizing transactional.id in the
producer requests will be good enough for version 1. And making it tighter
in future based on delegation tokens sounds good too.

Regards,

Rajini


On Fri, Feb 3, 2017 at 8:04 PM, Jason Gustafson <jason@confluent.io> wrote:

> Hey Rajini,
>
> Thanks for the questions. Responses below:
>
>
> > 1. Will the transaction coordinator check topic ACLs based on the
> > requesting client's credentials? Access to transaction logs, topics being
> > added for transaction etc?
>
>
> Good question. I think it makes sense to check topic Write permission when
> adding partitions to the transaction. I'll add this to the document.
> Perhaps authorization to the transaction log itself, however, can be
> assumed from having access to the ProducerTransactionalId resource? This
> would be similar to how access to __consumer_offsets is assumed if the
> client has access to the Group resource.
>
> 2. If I create a transactional produce request (by hand, not using the
> > producer API) with a random PID (random, hence unlikely to be in use),
> will
> > the broker append a transactional message to the logs, preventing LSO
> from
> > moving forward? What validation will broker do for PIDs?
>
>
> Yes, that is correct. Validation of the TransactionalId to PID binding is a
> known gap in the current proposal, and is discussed in the design document.
> Now that I'm thinking about it a bit more, I think there is a good case for
> including the TransactionalId in the ProduceRequest (I think Jun suggested
> this previously). Verifying it does not ensure that the included PID is
> correct, but it does ensure that the client is authorized to use
> transactions. If the client wanted to do an "endless transaction attack,"
> having Write access to the topic and an authorized transactionalID is all
> they would need anyway even if we could authorize the PID itself. This
> seems like a worthwhile improvement.
>
> For future work, my half-baked idea to authorize the PID binding is to
> leverage the delegation work in KIP-48. When the PID is generated, we can
> give the producer a token which is then used in produce requests (say an
> hmac covering the TransactionalId and PID).
>
>
> > 3. Will every broker check that a client sending transactional produce
> > requests at least has write access to transaction log topic since it is
> not
> > validating transactional.id (for every produce request)?
>
>  4. I understand that brokers cannot authorize the transactional id for
> each
> > produce request since requests contain only the PID. But since there is a
> > one-to-one mapping between PID and transactional.id, and a connection is
> > never expected to change its transactional.id, perhaps it is feasible to
> > add authorization and cache the results in the Session? Perhaps not for
> > version 1, but feels like it will be good to close the security gap here.
> > Obviously it would be simpler if transactional.id was in the produce
> > request if the overhead was acceptable.
>
>
> I think my response above addresses both of these. We should include the
> TransactionalId in the ProduceRequest. Of course it need not be included in
> the message format, so I'm not too concerned about the additional overhead
> it adds.
>
> Thanks,
> Jason
>
>
> On Fri, Feb 3, 2017 at 6:52 AM, Ismael Juma <ismael@juma.me.uk> wrote:
>
> > Comments inline.
> >
> > On Thu, Feb 2, 2017 at 6:28 PM, Jason Gustafson <jason@confluent.io>
> > wrote:
> >
> > > Took me a while to remember why we didn't do this. The timestamp that
> is
> > > included at the message set level is the max timestamp of all messages
> in
> > > the message set as is the case in the current message format (I will
> > update
> > > the document to make this explicit). We could make the message
> timestamps
> > > relative to the max timestamp, but that makes serialization a bit
> awkward
> > > since the timestamps are not assumed to be increasing sequentially or
> > > monotonically. Once the messages in the message set had been
> determined,
> > we
> > > would need to go back and adjust the relative timestamps.
> > >
> >
> > Yes, I thought this would be a bit tricky and hence why I mentioned the
> > option of adding a new field at the message set level for the first
> > timestamp even though that's not ideal either.
> >
> > Here's one idea. We let the timestamps in the messages be varints, but we
> > > make their values be relative to the timestamp of the previous message,
> > > with the timestamp of the first message being absolute. For example, if
> > we
> > > had timestamps 500, 501, 499, then we would write 500 for the first
> > > message, 1 for the next, and -2 for the final message. Would that work?
> > Let
> > > me think a bit about it and see if there are any problems.
> > >
> >
> > It's an interesting idea. Comparing to the option of having the first
> > timestamp in the message set, It's a little more space efficient as we
> > don't have both a full timestamp in the message set _and_ a varint in the
> > first message (which would always be 0, so we avoid the extra byte) and
> > also the deltas could be a little smaller in the common case. The main
> > downside is that it introduces a semantics inconsistency between the
> first
> > message and the rest. Not ideal, but maybe we can live with that.
> >
> > Ismael
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message