kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rosenberg <...@squareup.com>
Subject Re: produce request failed: due to Leader not local for partition
Date Sat, 29 Jun 2013 13:21:42 GMT
I added this scenario to KAFKA-955.

I'm thinking that this scenario could be a problem for ack=0 in general
(even without controlled shutdown).  If we do an "uncontrolled" shutdown,
it seems that some topics won't ever know there could have been a leader
change.  Would it make sense to force a meta-data refresh for all topics on
a broker, any time an IOException happens on a socket (e.g. "connection
reset")?  Currently, it looks like only the topic that experiences the
failure will have a metadata refresh issued for it.

Maybe this should be a separate jira issue, now that I think about it.

Jason


On Mon, Jun 24, 2013 at 10:52 PM, Jason Rosenberg <jbr@squareup.com> wrote:

> Also, looking back at my logs, I'm wondering if a producer will reuse the
> same socket to send data to the same broker, for multiple topics (I'm
> guessing yes).  In which case, it looks like I'm seeing this scenario:
>
> 1. producer1 is happily sending messages for topicX and topicY to serverA
> (serverA is the leader for both topics, only 1 partition for each topic for
> simplicity).
> 2. serverA is restarted, and in the process, serverB becomes the new
> leader for both topicX and topicY.
> 3. producer1 decides to send a new message to topicX to serverA.
> 3a. this results in an exception ("Connection reset by peer").
>  producer1's connection to serverA is invalidated.
> 3b. producer1 makes a new metadata request for topicX, and learns that
> serverB is now the leader for topicX.
> 3c. producer1 resends the message to topicX, on serverB.
> 4. producer1 decides to send a new message to topicY to serverA.
> 4a. producer1 notes that it's socket to serverA is invalid, so it creates
> a new connection to serverA.
> 4b. producer1 successfully sends it's message to serverA (without
> realizing that serverA is no longer the leader for topicY).
> 4c. serverA logs to it's console:
> 2013-06-23 08:28:46,770  WARN [kafka-request-handler-2] server.KafkaApis -
> [KafkaApi-508818741] Produce request with correlation id 7136261 from
> client  on partition [mytopic,0] failed due to Leader not local for
> partition [mytopic,0] on broker 508818741
> 5. producer1 continues to send messages for topicY to serverA, and serverA
> continues to log the same messages.
> 6. 10 minutes later, producer1 decides to update it's metadata for topicY,
> and learns that serverB is now the leader for topidY.
> 7. the warning messages finally stop in the console for serverA.
>
> I am pretty sure this scenario, or one very close to it, is what I'm
> seeing in my logs, after doing a rolling restart, with controlled shutdown.
>
> Does this scenario make sense?
>
> One thing I notice, is that in the steady state, every 10 minutes the
> producer refreshes it's metadata for all topics.  However, when sending a
> message to a specific topic fails, only the metadata for that topic is
> refreshed, even though the ramifications should be that all topics which
> have the same leader might need to be refreshed, especially in response to
> a "connection reset by peer".
>
> Jason
>
>
>
> On Mon, Jun 24, 2013 at 10:14 PM, Jason Rosenberg <jbr@squareup.com>wrote:
>
>> Jun,
>>
>> To be clear, this whole discussion was started, because I am clearly
>> seeing "failed due to Leader not local" on the last broker restarted,
>> after all the controlled shutting down has completed and all brokers
>> restarted.
>>
>> This leads me to believe that a client made a meta data request and found
>> out that server A was the leader for it's partition, and then server A was
>> restarted, and then the client makes repeated producer requests to server
>> A, without encountering a broken socket.  Thus, I'm not sure it's correct
>> that the socket is invalidated in that case after a restart.
>>
>> Alternatively, could it be that the client (which sends messages to
>> multiple topics), gets metadata updates for multiple topics, but doesn't
>> attempt to send a message to topicX until after the leader has changed and
>> server A has been restarted.  In this case, if it's the first time the
>> producer sends to topicX, does it only then create a new socket?
>>
>> Jason
>>
>>
>> On Mon, Jun 24, 2013 at 10:00 PM, Jun Rao <junrao@gmail.com> wrote:
>>
>>> That should be fine since the old socket in the producer will no longer
>>> be
>>> usable after a broker is restarted.
>>>
>>> Thanks,
>>>
>>> Jun
>>>
>>>
>>> On Mon, Jun 24, 2013 at 9:50 PM, Jason Rosenberg <jbr@squareup.com>
>>> wrote:
>>>
>>> > What about a non-controlled shutdown, and a restart, but the producer
>>> never
>>> > attempts to send anything during the time the broker was down?  That
>>> could
>>> > have caused a leader change, but without the producer knowing to
>>> refresh
>>> > it's metadata, no?
>>> >
>>> >
>>> > On Mon, Jun 24, 2013 at 9:05 PM, Jun Rao <junrao@gmail.com> wrote:
>>> >
>>> > > Other than controlled shutdown, the only other case that can cause
>>> the
>>> > > leader to change when the underlying broker is alive is when the
>>> broker
>>> > > expires its ZK session (likely due to GC), which should be rare. That
>>> > being
>>> > > said, forwarding in the broker may not be a bad idea. Could you file
>>> a
>>> > jira
>>> > > to track this?
>>> > >
>>> > > Thanks,
>>> > >
>>> > > Jun
>>> > >
>>> > >
>>> > > On Mon, Jun 24, 2013 at 2:50 PM, Jason Rosenberg <jbr@squareup.com>
>>> > wrote:
>>> > >
>>> > > > Yeah,
>>> > > >
>>> > > > I see that with ack=0, the producer will be in a bad state anytime
>>> the
>>> > > > leader for it's partition has changed, while the broker that it
>>> thinks
>>> > is
>>> > > > the leader is still up.  So this is a problem in general, not
only
>>> for
>>> > > > controlled shutdown, but even for the case where you've restarted
a
>>> > > server
>>> > > > (without controlled shutdown), which in and of itself can force
a
>>> > leader
>>> > > > change.  If the producer doesn't attempt to send a message during
>>> the
>>> > > time
>>> > > > the broker was down, it will never get a connection failure, and
>>> never
>>> > > get
>>> > > > fresh metadata, and subsequently start sending messages to the
>>> > > non-leader.
>>> > > >
>>> > > > Thus, I'd say this is a problem with ack=0, regardless of
>>> controlled
>>> > > > shutdown.  Any time there's a leader change, the producer will
send
>>> > > > messages into the ether.  I think this is actually a severe
>>> condition,
>>> > > that
>>> > > > could be considered a bug.  How hard would it be to have the
>>> receiving
>>> > > > broker forward on to the leader, in this case?
>>> > > >
>>> > > > Jason
>>> > > >
>>> > > >
>>> > > > On Mon, Jun 24, 2013 at 8:44 AM, Joel Koshy <jjkoshy.w@gmail.com>
>>> > wrote:
>>> > > >
>>> > > > > I think Jason was suggesting quiescent time as a possibility
>>> only if
>>> > > the
>>> > > > > broker did request forwarding if it is not the leader.
>>> > > > >
>>> > > > > On Monday, June 24, 2013, Jun Rao wrote:
>>> > > > >
>>> > > > > > Jason,
>>> > > > > >
>>> > > > > > The quiescence time that you proposed won't work. The
reason is
>>> > that
>>> > > > with
>>> > > > > > ack=0, the producer starts losing data silently from
the
>>> moment the
>>> > > > > leader
>>> > > > > > is moved (by controlled shutdown) until the broker is
shut
>>> down.
>>> > So,
>>> > > > the
>>> > > > > > sooner that you can shut down the broker, the better.
What we
>>> > > realized
>>> > > > is
>>> > > > > > that if you can use a larger batch size, ack=1 can still
>>> deliver
>>> > very
>>> > > > > good
>>> > > > > > throughput.
>>> > > > > >
>>> > > > > > Thanks,
>>> > > > > >
>>> > > > > > Jun
>>> > > > > >
>>> > > > > >
>>> > > > > > On Mon, Jun 24, 2013 at 12:22 AM, Jason Rosenberg <
>>> > jbr@squareup.com
>>> > > > > <javascript:;>>
>>> > > > > > wrote:
>>> > > > > >
>>> > > > > > > Yeah I am using ack = 0, so that makes sense. 
I'll need to
>>> > rethink
>>> > > > > that,
>>> > > > > > > it would seem.  It would be nice, wouldn't it,
in this case,
>>> for
>>> > > the
>>> > > > > > broker
>>> > > > > > > to realize this and just forward the messages to
the correct
>>> > > leader.
>>> > > > > >  Would
>>> > > > > > > that be possible?
>>> > > > > > >
>>> > > > > > > Also, it would be nice to have a second option
to the
>>> controlled
>>> > > > > shutdown
>>> > > > > > > (e.g. controlled.shutdown.quiescence.ms), to allow
the
>>> broker to
>>> > > > wait
>>> > > > > > > after
>>> > > > > > > the controlled shutdown, a prescribed amount of
time before
>>> > > actually
>>> > > > > > > shutting down the server. Then, I could set this
value to
>>> > > something a
>>> > > > > > > little greater than the producer's '
>>> > > > topic.metadata.refresh.interval.ms
>>> > > > > '.
>>> > > > > > >  This would help with hitless rolling restarts
too.
>>>  Currently,
>>> > > every
>>> > > > > > > producer gets a very loud "Connection Reset" with
a tall
>>> stack
>>> > > trace
>>> > > > > each
>>> > > > > > > time I restart a broker.  Would be nicer to have
the
>>> producers
>>> > > still
>>> > > > be
>>> > > > > > > able to produce until the metadata refresh interval
expires,
>>> then
>>> > > get
>>> > > > > the
>>> > > > > > > word that the leader has moved due to the controlled
>>> shutdown,
>>> > and
>>> > > > then
>>> > > > > > > start producing to the new leader, all before the
shutting
>>> down
>>> > > > server
>>> > > > > > > actually shuts down.  Does that seem feasible?
>>> > > > > > >
>>> > > > > > > Jason
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > On Sun, Jun 23, 2013 at 8:23 PM, Jun Rao <junrao@gmail.com
>>> > > > > <javascript:;>>
>>> > > > > > wrote:
>>> > > > > > >
>>> > > > > > > > Jason,
>>> > > > > > > >
>>> > > > > > > > Are you using ack = 0 in the producer? This
mode doesn't
>>> work
>>> > > well
>>> > > > > with
>>> > > > > > > > controlled shutdown (this is explained in
FAQ i*n
>>> > > > > > > >
>>> > > > >
>>> > https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#)*
>>> > > > > > > > *
>>> > > > > > > > *
>>> > > > > > > > Thanks,
>>> > > > > > > >
>>> > > > > > > > Jun
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > On Sun, Jun 23, 2013 at 1:45 AM, Jason Rosenberg
<
>>> > > jbr@squareup.com
>>> > > > > <javascript:;>
>>> > > > > > >
>>> > > > > > > wrote:
>>> > > > > > > >
>>> > > > > > > > > I'm working on trying on having seamless
rolling
>>> restarts for
>>> > > my
>>> > > > > > kafka
>>> > > > > > > > > servers, running 0.8.  I have it so that
each server
>>> will be
>>> > > > > > restarted
>>> > > > > > > > > sequentially.  Each server takes itself
out of the load
>>> > > balancer
>>> > > > > > (e.g.
>>> > > > > > > > sets
>>> > > > > > > > > a status that the lb will recognize,
and then waits more
>>> than
>>> > > > long
>>> > > > > > > enough
>>> > > > > > > > > for the lb to stop sending meta-data
requests to that
>>> > server).
>>> > > > >  Then
>>> > > > > > I
>>> > > > > > > > > initiate the shutdown (with
>>> controlled.shutdown.enable=true).
>>> > > >  This
>>> > > > > > > seems
>>> > > > > > > > > to work well, however, I occasionally
see warnings like
>>> this
>>> > in
>>> > > > the
>>> > > > > > log
>>> > > > > > > > > from the server, after restart:
>>> > > > > > > > >
>>> > > > > > > > > 2013-06-23 08:28:46,770  WARN [kafka-request-handler-2]
>>> > > > > > > server.KafkaApis
>>> > > > > > > > -
>>> > > > > > > > > [KafkaApi-508818741] Produce request
with correlation id
>>> > > 7136261
>>> > > > > from
>>> > > > > > > > > client  on partition [mytopic,0] failed
due to Leader not
>>> > local
>>> > > > for
>>> > > > > > > > > partition [mytopic,0] on broker 508818741
>>> > > > > > > > >
>>> > > > > > > > > This WARN seems to persistently repeat,
until the
>>> producer
>>> > > client
>>> > > > > > > > initiates
>>> > > > > > > > > a new meta-data request (e.g. every 10
minutes, by
>>> default).
>>> > > > > >  However,
>>> > > > > > > > the
>>> > > > > > > > > producer doesn't log any errors/exceptions
when the
>>> server is
>>> > > > > logging
>>> > > > > > > > this
>>> > > > > > > > > WARN.
>>> > > > > > > > >
>>> > > > > > > > > What's happening here?  Is the message
silently being
>>> > forwarded
>>> > > > on
>>> > > > > to
>>> > > > > > > the
>>> > > > > > > > > correct leader for the partition?  Is
the message
>>> dropped?
>>> >  Are
>>> > > > > these
>>> > > > > > > > WARNS
>>> > > > > > > > > particularly useful?
>>> > > > > > > > >
>>> > > > > > > > > Thanks,
>>> > > > > > > > >
>>> > > > > > > > > Jason
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message