Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@kafka.apache.org
Message-Id: <1512498510.3788994.1194878408.208D3434@webmail.messagingengine.com>
From: Colin McCabe <cmccabe@apache.org>
To: dev@kafka.apache.org
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="utf-8"
In-Reply-To: <CABtAgwGYUfCtSmsH7dEJDf47ARhrH1nqTWShOg1p6tQhLLyO6A@mail.gmail.com>
Date: Tue, 05 Dec 2017 10:28:30 -0800
References: <1511298156.455181.1180266296.193D3A00@webmail.messagingengine.com>
 <5A1D027C.7020206@trivago.com>
 <1511887086.3498918.1186985376.7F1DEDD6@webmail.messagingengine.com>
 <1511986396.621620.1188577080.5C0A7DEC@webmail.messagingengine.com>
 <CAAaarBaYx0kK59XLbzqzq80wG6QV_2sfiT_vu2t07xaN5WeTfQ@mail.gmail.com>
 <1512063435.1832330.1189640136.2D1EEEC1@webmail.messagingengine.com>
 <CAAaarBZFgpDAj2bEDeH37EZy6_5MDx7Rt3A+Tszbh+V8yPTM7w@mail.gmail.com>
 <1512254273.250506.1191933912.13817ECC@webmail.messagingengine.com>
 <CABtAgwFPo=fH=PgVKDFQeYQN9+1ZU3_vDyFt+=R4Lv2V3+t5_w@mail.gmail.com>
 <1512334541.920899.1192564144.1A4C5D62@webmail.messagingengine.com>
 <CABtAgwGYUfCtSmsH7dEJDf47ARhrH1nqTWShOg1p6tQhLLyO6A@mail.gmail.com>
Subject: Re: [DISCUSS] KIP-227: Introduce Incremental FetchRequests to Increase
 Partition Scalability
archived-at: Tue, 05 Dec 2017 18:28:35 -0000

On Sun, Dec 3, 2017, at 16:28, Becket Qin wrote:
> >The correlation ID is used within a single TCP session, to uniquely
> >associate a request with a response.  The correlation ID is not unique
> >(and has no meaning) outside the context of that single TCP session.
> >
> >Keep in mind, NetworkClient is in charge of TCP sessions, and generally
> >tries to hide that information from the upper layers of the code.  So
> >when you submit a request to NetworkClient, you don't know if that
> >request creates a TCP session, or reuses an existing one.
> 
> Hmm, the correlation id is an application level information in each Kafka
> request. It is maintained by o.a.k.c.NetworkClient. It is not associated
> with TCP sessions. So even the TCP session disconnects and reconnects,
> the correlation id is not reset and will still be monotonically increasing.

Hi Becket,

That's a fair point.  I was thinking of previous RPC systems I worked
with.  But in Kafka, you're right that the correlation ID is maintained
by a single counter in NetworkClient, rather than being a counter
per-connection.

In any case, the correlation ID is there in order to associate a request
with a response within a single TCP session.  It's not unique, even on a
single node, if there is more than one NetworkClient.  It will get reset
to 0 any time we restart the process or re-create the NetworkClient
object.

> 
> Maybe I did not make it clear. I am not suggesting anything relying on
> TCP or transport layer. Everything is handled at application layer. From the
> clients perspective, the timeout is not defined as TCP timeout, it is
> defined as the upper bound of time it will wait before receiving a
> response. If the client did not receive a response before the timeout is
> reached, it will just retry. My suggestion was that as long as a
> FetchRequest needs to be retried, no matter for what reason, we just use
> a full FetchRequest. This does not depend on NetworkClient implementations,
> i.e. regardless of whether the retry is on the existing TCP connection or
> a new one.

So, with this proposal, if the TCP session drops, then the client needs
to retransmit, right?  That's why I said this proposal couples the TCP
session with the incremental fetch session.  In general, I don't see why
you would want to couple those two things.

If the network is under heavy load, it might cause a few TCP sessions to
drop.  If a dropped TCP session means that someone has to fall back to
sending a much larger full fetch request, that's a positive feedback
loop.  It could lead to congestion collapse.

In general, I think that the current KIP proposal, which allows an
incremental fetch session to persist across multiple TCP sessions, is
superior to a proposal which doesn't allow that.  It also avoids
worrying about message reordering within the server due to multiple
worker threads and delayed requests.  It's just simpler, easier, and
more efficient to have the sequence number than to not have it.

> 
> The question we are trying to answer here is essentially how to let the
> leader and followers agree on the messages in the log. And we are
> comparing
> the following two solutions:
> 1. Use something like a TCP ACK with epoch at Request/Response level.
> 2. Piggy back the leader knowledge at partition level for the follower to
> confirm.

The existing KIP proposal is not really similar to a TCP ACK.  A TCP ACK
involves sending back an actual ACK packet.  The KIP-227 proposal just
has an incrementing sequence number which the client increments each
time it successfully receives a response.

> 
> Personally I think (2) is better because (2) is more direct. The leader
> is the one who maintains all the state (LEOs) of the followers. At the end
> of the day, the leader just wants to make sure all those states are correct.
> (2) directly confirms those states with the followers instead of
> inferring that from a epoch.

The problem is that when using incremental updates, we can't "directly
confirm" that the follower and the leader are in sync.  For example,
suppose the follower loses a response which gives an update for some
partition.  Then, the partition is not changed after that.  The follower
has no way of knowing that that data is missing, just by looking at the
responses.  That's why it is so important to keep the follower and the
leader in lockstep by using the sequence number.

> Note that there is a subtle but maybe important
> difference between our use case of epoch and TCP seq. The difference is
> that a TCP ACK confirms all the packets with a lower seq has been
> received.
> In our case, a high epoch request does not mean all the data in the
> previous response was successful. So in the KIP, the statement of "When
> the leader receives a fetch request with epoch N + 1, it knows that the data
> it sent back for the fetch request with epoch N was successfully processed
> by the follower." could be tricky or expensive to make right in some cases.

Hmm, let me be more precise here.  The client increments the sequence
number by 1 if it wants new data that it didn't get in the last
transmission.  Otherwise, it resends the same sequence number, and gets
the same data that it got in the previous transmission.  The mechanism
for replication is unchanged-- the when the leader gets a request from a
follower for offset N of topic T, it assumes that the follower has
replicated messages up to, but not including, N.

There are no extra guarantees here.  In particular, note that we never
assume that the follower gets any response that we sent.  If the
follower didn't get a response, it's the follower's job to resend the
request with the same sequence number, and get the response.

> 
> Not sure if we have considered this, but when thinking of the above
> comparison, the following two potential issues came up:
> 
> 1. Thinking about the case of a consumer. If consumer.seek() or
> consumer.pause() is called. the consumer has essentially updated its
> interested set of topics or positions. This will needs a full
> FetchRequest to update the position on the leader. And thus create a new session. Now
> if users call seek()/pause() very often, the broker could run out of fetch
> session slot pretty quickly.

Well, Consumer#seek doesn't require the consumer to send a full fetch
request.  The consumer can simply update its offset in the next
incremental fetch request, in the usual way.

I agree that Consumer#pause does requires us to fall back to a full
fetch request.  Pause changes the set of partitions we are including in
the fetch request.  However, the clients which we are most interested
in, such as MirrorMaker, don't seem to use pause().  If we decide we
need to optimize this further, we certainly can.  We can add ways of
modifying the set of partitions that are fetched in an incremental fetch
session, for example.  But it is better to think about this optimization
later.

> 
> 2. Corrupted messages. If a fetch response has a corrupt message, the
> follower will back off for a while and try fetch again. During the back
> off period, the follower will not be fetching from the partition with corrupt
> message. And after the back off the partition will be added back. With
> the current design, it seems the follower will need to keep creating new
> sessions.

So, the latest proposal allows clients to reuse their existing session,
rather than creating a new one.  But they do still have to send a full
fetch request, rather than an incremental, when changing the set of
partitions.

> 
> In the above two cases, it might still be useful to let the session id be
> unique for each client instance (just like the producer id for the
> idempotent produce) and allow the client to update the leader side
> interested partitions and position with full FetchRequest without
> creating a new session id.

Yeah, I agree.  We should have a way to change the set of partitions in
the session.  That seems like something that should go in a follow-up
change.  And this is also an example of how staying in lockstep is
extremely important (we can't afford to lose or reorder the request that
added some partition to the session).

best,
Colin

> 
> Thanks,
> 
> Jiangjie (Becket) Qin
> 
> 
> 
> On Sun, Dec 3, 2017 at 12:55 PM, Colin McCabe <cmccabe@apache.org> wrote:
> 
> > On Sat, Dec 2, 2017, at 23:21, Becket Qin wrote:
> > > Thanks for the explanation, Colin. A few more questions.
> > >
> > > >The session epoch is not complex.  It's just a number which increments
> > > >on each incremental fetch.  The session epoch is also useful for
> > > >debugging-- it allows you to match up requests and responses when
> > > >looking at log files.
> > >
> > > Currently each request in Kafka has a correlation id to help match the
> > > requests and responses. Is epoch doing something differently?
> >
> > Hi Becket,
> >
> > The correlation ID is used within a single TCP session, to uniquely
> > associate a request with a response.  The correlation ID is not unique
> > (and has no meaning) outside the context of that single TCP session.
> >
> > Keep in mind, NetworkClient is in charge of TCP sessions, and generally
> > tries to hide that information from the upper layers of the code.  So
> > when you submit a request to NetworkClient, you don't know if that
> > request creates a TCP session, or reuses an existing one.
> >
> > >
> > > >Unfortunately, this doesn't work.  Imagine the client misses an
> > > >increment fetch response about a partition.  And then the partition is
> > > >never updated after that.  The client has no way to know about the
> > > >partition, since it won't be included in any future incremental fetch
> > > >responses.  And there are no offsets to compare, since the partition is
> > > >simply omitted from the response.
> > >
> > > I am curious about in which situation would the follower miss a response
> > > of a partition. If the entire FetchResponse is lost (e.g. timeout), the
> > > follower would disconnect and retry. That will result in sending a full
> > > FetchRequest.
> >
> > Basically, you are proposing that we rely on TCP for reliable delivery
> > in a distributed system.  That isn't a good idea for a bunch of
> > different reasons.  First of all, TCP timeouts tend to be very long.  So
> > if the TCP session timing out is your error detection mechanism, you
> > have to wait minutes for messages to timeout.  Of course, we add a
> > timeout on top of that after which we declare the connection bad and
> > manually close it.  But just because the session is closed on one end
> > doesn't mean that the other end knows that it is closed.  So the leader
> > may have to wait quite a long time before TCP decides that yes,
> > connection X from the follower is dead and not coming back, even though
> > gremlins ate the FIN packet which the follower attempted to translate.
> > If the cache state is tied to that TCP session, we have to keep that
> > cache around for a much longer time than we should.
> >
> > Secondly, from a software engineering perspective, it's not a good idea
> > to try to tightly tie together TCP and our code.  We would have to
> > rework how we interact with NetworkClient so that we are aware of things
> > like TCP sessions closing or opening.  We would have to be careful
> > preserve the ordering of incoming messages when doing things like
> > putting incoming requests on to a queue to be processed by multiple
> > threads.  It's just a lot of complexity to add, and there's no upside.
> >
> > Imagine that I made an argument that client IDs are "complex" and should
> > be removed from our APIs.  After all, we can just look at the remote IP
> > address and TCP port of each connection.  Would you think that was a
> > good idea?  The client ID is useful when looking at logs.  For example,
> > if a rebalance is having problems, you want to know what clients were
> > having a problem.  So having the client ID field to guide you is
> > actually much less "complex" in practice than not having an ID.
> >
> > Similarly, if metadata responses had epoch numbers (simple incrementing
> > numbers), we would not have to debug problems like clients accidentally
> > getting old metadata from servers that had been partitioned off from the
> > network for a while.  Clients would know the difference between old and
> > new metadata.  So putting epochs in to the metadata request is much less
> > "complex" operationally, even though it's an extra field in the request.
> >  This has been discussed before on the mailing list.
> >
> > So I think the bottom line for me is that having the session ID and
> > session epoch, while it adds two extra fields, reduces operational
> > complexity and increases debuggability.  It avoids tightly coupling us
> > to assumptions about reliable ordered delivery which tend to be violated
> > in practice in multiple layers of the stack.  Finally, it  avoids the
> > necessity of refactoring NetworkClient.
> >
> > best,
> > Colin
> >
> >
> > > If there is an error such as NotLeaderForPartition is
> > > returned for some partitions, the follower can always send a full
> > > FetchRequest. Is there a scenario that only some of the partitions in a
> > > FetchResponse is lost?
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > > On Sat, Dec 2, 2017 at 2:37 PM, Colin McCabe <cmccabe@apache.org> wrote:
> > >
> > > > On Fri, Dec 1, 2017, at 11:46, Dong Lin wrote:
> > > > > On Thu, Nov 30, 2017 at 9:37 AM, Colin McCabe <cmccabe@apache.org>
> > > > wrote:
> > > > >
> > > > > > On Wed, Nov 29, 2017, at 18:59, Dong Lin wrote:
> > > > > > > Hey Colin,
> > > > > > >
> > > > > > > Thanks much for the update. I have a few questions below:
> > > > > > >
> > > > > > > 1. I am not very sure that we need Fetch Session Epoch. It seems
> > that
> > > > > > > Fetch
> > > > > > > Session Epoch is only needed to help leader distinguish between
> > "a
> > > > full
> > > > > > > fetch request" and "a full fetch request and request a new
> > > > incremental
> > > > > > > fetch session". Alternatively, follower can also indicate "a full
> > > > fetch
> > > > > > > request and request a new incremental fetch session" by setting
> > Fetch
> > > > > > > Session ID to -1 without using Fetch Session Epoch. Does this
> > make
> > > > sense?
> > > > > >
> > > > > > Hi Dong,
> > > > > >
> > > > > > The fetch session epoch is very important for ensuring
> > correctness.  It
> > > > > > prevents corrupted or incomplete fetch data due to network
> > reordering
> > > > or
> > > > > > loss.
> > > > > >
> > > > > > For example, consider a scenario where the follower sends a fetch
> > > > > > request to the leader.  The leader responds, but the response is
> > lost
> > > > > > because of network problems which affected the TCP session.  In
> > that
> > > > > > case, the follower must establish a new TCP session and re-send the
> > > > > > incremental fetch request.  But the leader does not know that the
> > > > > > follower didn't receive the previous incremental fetch response.
> > It is
> > > > > > only the incremental fetch epoch which lets the leader know that it
> > > > > > needs to resend that data, and not data which comes afterwards.
> > > > > >
> > > > > > You could construct similar scenarios with message reordering,
> > > > > > duplication, etc.  Basically, this is a stateful protocol on an
> > > > > > unreliable network, and you need to know whether the follower got
> > the
> > > > > > previous data you sent before you move on.  And you need to handle
> > > > > > issues like duplicated or delayed requests.  These issues do not
> > affect
> > > > > > the full fetch request, because it is not stateful-- any full fetch
> > > > > > request can be understood and properly responded to in isolation.
> > > > > >
> > > > >
> > > > > Thanks for the explanation. This makes sense. On the other hand I
> > would
> > > > > be interested in learning more about whether Becket's solution can
> > help
> > > > > simplify the protocol by not having the echo field and whether that
> > is
> > > > > worth doing.
> > > >
> > > > Hi Dong,
> > > >
> > > > I commented about this in the other thread.  A solution which doesn't
> > > > maintain session information doesn't work here.
> > > >
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > 2. It is said that Incremental FetchRequest will include
> > partitions
> > > > whose
> > > > > > > fetch offset or maximum number of fetch bytes has been changed.
> > If
> > > > > > > follower's logStartOffet of a partition has changed, should this
> > > > > > > partition also be included in the next FetchRequest to the
> > leader?
> > > > > > Otherwise, it
> > > > > > > may affect the handling of DeleteRecordsRequest because leader
> > may
> > > > not
> > > > > > know
> > > > > > > the corresponding data has been deleted on the follower.
> > > > > >
> > > > > > Yeah, the follower should include the partition if the
> > logStartOffset
> > > > > > has changed.  That should be spelled out on the KIP.  Fixed.
> > > > > >
> > > > > > >
> > > > > > > 3. In the section "Per-Partition Data", a partition is not
> > considered
> > > > > > > dirty if its log start offset has changed. Later in the section
> > > > > > "FetchRequest
> > > > > > > Changes", it is said that incremental fetch responses will
> > include a
> > > > > > > partition if its logStartOffset has changed. It seems
> > inconsistent.
> > > > Can
> > > > > > > you update the KIP to clarify it?
> > > > > > >
> > > > > >
> > > > > > In the "Per-Partition Data" section, it does say that
> > logStartOffset
> > > > > > changes make a partition dirty, though, right?  The first bullet
> > point
> > > > > > is:
> > > > > >
> > > > > > > * The LogCleaner deletes messages, and this changes the log start
> > > > offset
> > > > > > of the partition on the leader., or
> > > > > >
> > > > >
> > > > > Ah I see. I think I didn't notice this because statement assumes
> > that the
> > > > > LogStartOffset in the leader only changes due to LogCleaner. In fact
> > the
> > > > > LogStartOffset can change on the leader due to either log retention
> > and
> > > > > DeleteRecordsRequest. I haven't verified whether LogCleaner can
> > change
> > > > > LogStartOffset though. It may be a bit better to just say that a
> > > > > partition is considered dirty if LogStartOffset changes.
> > > >
> > > > I agree.  It should be straightforward to just resend the partition if
> > > > logStartOffset changes.
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > > > 4. In "Fetch Session Caching" section, it is said that each
> > broker
> > > > has a
> > > > > > > limited number of slots. How is this number determined? Does this
> > > > require
> > > > > > > a new broker config for this number?
> > > > > >
> > > > > > Good point.  I added two broker configuration parameters to control
> > > > this
> > > > > > number.
> > > > > >
> > > > >
> > > > > I am curious to see whether we can avoid some of these new configs.
> > For
> > > > > example, incremental.fetch.session.cache.slots.per.broker is
> > probably
> > > > not
> > > > > necessary because if a leader knows that a FetchRequest comes from a
> > > > > follower, we probably want the leader to always cache the information
> > > > > from that follower. Does this make sense?
> > > >
> > > > Yeah, maybe we can avoid having
> > > > incremental.fetch.session.cache.slots.per.broker.
> > > >
> > > > >
> > > > > Maybe we can discuss the config later after there is agreement on
> > how the
> > > > > protocol would look like.
> > > > >
> > > > >
> > > > > >
> > > > > > > What is the error code if broker does
> > > > > > > not have new log for the incoming FetchRequest?
> > > > > >
> > > > > > Hmm, is there a typo in this question?  Maybe you meant to ask what
> > > > > > happens if there is no new cache slot for the incoming
> > FetchRequest?
> > > > > > That's not an error-- the incremental fetch session ID just gets
> > set to
> > > > > > 0, indicating no incremental fetch session was created.
> > > > > >
> > > > >
> > > > > Yeah there is a typo. You have answered my question.
> > > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > 5. Can you clarify what happens if follower adds a partition to
> > the
> > > > > > > ReplicaFetcherThread after receiving LeaderAndIsrRequest? Does
> > leader
> > > > > > > needs to generate a new session for this ReplicaFetcherThread or
> > > > does it
> > > > > > re-use
> > > > > > > the existing session?  If it uses a new session, is the old
> > session
> > > > > > > actively deleted from the slot?
> > > > > >
> > > > > > The basic idea is that you can't make changes, except by sending a
> > full
> > > > > > fetch request.  However, perhaps we can allow the client to re-use
> > its
> > > > > > existing session ID.  If the client sets sessionId = id, epoch =
> > 0, it
> > > > > > could re-initialize the session.
> > > > > >
> > > > >
> > > > > Yeah I agree with the basic idea. We probably want to understand more
> > > > > detail about how this works later.
> > > >
> > > > Sounds good.  I updated the KIP with this information.  A
> > > > re-initialization should be exactly the same as an initialization,
> > > > except that it reuses an existing ID.
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > BTW, I think it may be useful if the KIP can include the example
> > > > workflow
> > > > > > > of how this feature will be used in case of partition change and
> > so
> > > > on.
> > > > > >
> > > > > > Yeah, that might help.
> > > > > >
> > > > > > best,
> > > > > > Colin
> > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Dong
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Nov 29, 2017 at 12:13 PM, Colin McCabe <
> > cmccabe@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I updated the KIP with the ideas we've been discussing.
> > > > > > > >
> > > > > > > > best,
> > > > > > > > Colin
> > > > > > > >
> > > > > > > > On Tue, Nov 28, 2017, at 08:38, Colin McCabe wrote:
> > > > > > > > > On Mon, Nov 27, 2017, at 22:30, Jan Filipiak wrote:
> > > > > > > > > > Hi Colin, thank you  for this KIP, it can become a really
> > > > useful
> > > > > > thing.
> > > > > > > > > >
> > > > > > > > > > I just scanned through the discussion so far and wanted to
> > > > start a
> > > > > > > > > > thread to make as decision about keeping the
> > > > > > > > > > cache with the Connection / Session or having some sort of
> > UUID
> > > > > > indN
> > > > > > > > exed
> > > > > > > > > > global Map.
> > > > > > > > > >
> > > > > > > > > > Sorry if that has been settled already and I missed it. In
> > this
> > > > > > case
> > > > > > > > > > could anyone point me to the discussion?
> > > > > > > > >
> > > > > > > > > Hi Jan,
> > > > > > > > >
> > > > > > > > > I don't think anyone has discussed the idea of tying the
> > cache
> > > > to an
> > > > > > > > > individual TCP session yet.  I agree that since the cache is
> > > > > > intended to
> > > > > > > > > be used only by a single follower or client, it's an
> > interesting
> > > > > > thing
> > > > > > > > > to think about.
> > > > > > > > >
> > > > > > > > > I guess the obvious disadvantage is that whenever your TCP
> > > > session
> > > > > > > > > drops, you have to make a full fetch request rather than an
> > > > > > incremental
> > > > > > > > > one.  It's not clear to me how often this happens in
> > practice --
> > > > it
> > > > > > > > > probably depends a lot on the quality of the network.  From a
> > > > code
> > > > > > > > > perspective, it might also be a bit difficult to access data
> > > > > > associated
> > > > > > > > > with the Session from classes like KafkaApis (although we
> > could
> > > > > > refactor
> > > > > > > > > it to make this easier).
> > > > > > > > >
> > > > > > > > > It's also clear that even if we tie the cache to the
> > session, we
> > > > > > still
> > > > > > > > > have to have limits on the number of caches we're willing to
> > > > create.
> > > > > > > > > And probably we should reserve some cache slots for each
> > > > follower, so
> > > > > > > > > that clients don't take all of them.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Id rather see a protocol in which the client is hinting the
> > > > broker
> > > > > > > > that,
> > > > > > > > > > he is going to use the feature instead of a client
> > > > > > > > > > realizing that the broker just offered the feature
> > (regardless
> > > > of
> > > > > > > > > > protocol version which should only indicate that the
> > feature
> > > > > > > > > > would be usable).
> > > > > > > > >
> > > > > > > > > Hmm.  I'm not sure what you mean by "hinting."  I do think
> > that
> > > > the
> > > > > > > > > server should have the option of not accepting incremental
> > > > requests
> > > > > > from
> > > > > > > > > specific clients, in order to save memory space.
> > > > > > > > >
> > > > > > > > > > This seems to work better with a per
> > > > > > > > > > connection/session attached Metadata than with a Map and
> > could
> > > > > > allow
> > > > > > > > for
> > > > > > > > > > easier client implementations.
> > > > > > > > > > It would also make Client-side code easier as there
> > wouldn't
> > > > be any
> > > > > > > > > > Cache-miss error Messages to handle.
> > > > > > > > >
> > > > > > > > > It is nice not to have to handle cache-miss responses, I
> > agree.
> > > > > > > > > However, TCP sessions aren't exposed to most of our
> > client-side
> > > > code.
> > > > > > > > > For example, when the Producer creates a message and hands it
> > > > off to
> > > > > > the
> > > > > > > > > NetworkClient, the NC will transparently re-connect and
> > re-send a
> > > > > > > > > message if the first send failed.  The higher-level code will
> > > > not be
> > > > > > > > > informed about whether the TCP session was re-established,
> > > > whether an
> > > > > > > > > existing TCP session was used, and so on.  So overall I would
> > > > still
> > > > > > lean
> > > > > > > > > towards not coupling this to the TCP session...
> > > > > > > > >
> > > > > > > > > best,
> > > > > > > > > Colin
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >   Thank you again for the KIP. And again, if this was
> > clarified
> > > > > > already
> > > > > > > > > > please drop me a hint where I could read about it.
> > > > > > > > > >
> > > > > > > > > > Best Jan
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On 21.11.2017 22:02, Colin McCabe wrote:
> > > > > > > > > > > Hi all,
> > > > > > > > > > >
> > > > > > > > > > > I created a KIP to improve the scalability and latency of
> > > > > > > > FetchRequest:
> > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > 227%3A+Introduce+Incremental+FetchRequests+to+Increase+
> > > > > > > > Partition+Scalability
> > > > > > > > > > >
> > > > > > > > > > > Please take a look.
> > > > > > > > > > >
> > > > > > > > > > > cheers,
> > > > > > > > > > > Colin
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >