kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guozhang Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-1286) Retry Can Block
Date Tue, 04 Mar 2014 19:05:23 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919775#comment-13919775

Guozhang Wang commented on KAFKA-1286:

Updated reviewboard https://reviews.apache.org/r/18740/
 against branch origin/trunk

> Retry Can Block 
> ----------------
>                 Key: KAFKA-1286
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1286
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: producer 
>            Reporter: Guozhang Wang
>         Attachments: KAFKA-1286.patch, KAFKA-1286_2014-03-04_11:04:32.patch
> Under the following scenario the retry logic can block
> 1. The last broker's socket closed, sender.handleDisconnect() triggered, put the node
as disconnected.
> 2. In the next sender.run(), since the node is disconnected, remove the partition from
ready set, and call sender.initConnection(), which will not throw exception.
> 3. So in this round of send, the only request it tries to send to is the metadata request,
to the last broker; and the sender will firstly try to connect to that broker.
> 4. In selector.poll(), the finishConnect() call will throw exception, and in handleDisconnects(),
inFlight request's batches will be null since it is a metadata request.
> 5. Now we will go back to 1, and loop forever. Note that this infinite loop can be triggered
even without calling producer.close.
> Also, we need to introduce the retry backoff config, otherwise the retries will be exhausted
too soon (in my tests 10 retries can be exhausted in about 600ms).

This message was sent by Atlassian JIRA

View raw message