kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Neha Narkhede (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-706) broker appears to be encoding ProduceResponse, but never sending it
Date Fri, 08 Feb 2013 05:49:12 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13574275#comment-13574275

Neha Narkhede commented on KAFKA-706:

Great catch, Sriram ! I think the v2 patch on KAFKA-736 might solve this problem.
> broker appears to be encoding ProduceResponse, but never sending it
> -------------------------------------------------------------------
>                 Key: KAFKA-706
>                 URL: https://issues.apache.org/jira/browse/KAFKA-706
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>         Environment: reproduced on both Mac OS and RH linux, via private node.js client
>            Reporter: ben fleis
>            Assignee: Sriram Subramanian
> By all appearances, I seem to be able to convince a broker to periodically encode, but
never transmit, a ProduceResponse.  Unfortunately my client is proprietary, but I will share
it with Neha via LI channels.  But I will describe what's going on in the hopes that there's
another trivial way to reproduce it.  (I did search through JIRA, and haven't found anything
that looks like this.)
> I am running a single instance zookeeper and single broker.  I have a client that generates
configurable amounts of data, tracking what is produced (both sent and ACK'd), and what is
consumed.  I was noticing that when using high transfer rates via high frequency single messages,
my unack'd queue appeared to be getting continuously larger.  So, I outfitted my client to
log more information about correlation ids at various stages, and modified the kafka ProducerRequest/ProducerResponse
to log (de)serialization of the same.  I then used tcpdump to intercept all communications
between my client and the broker.  Finally, I configured my client to generate 1 message per
~10ms, each payload being approximately 33 bytes; requestAckTimeout was set to 2000ms, and
requestAcksRequired was set to 1.  I used 10ms as I found that 5ms or less caused my unacked
queue to build up due to system speed -- it simply couldn't keep up.  10ms keeps the load
high, but just manageable.  YMMV with that param.  All of this is done on a single host, over
loopback.  I ran it on both my airbook, and a well setup RH linux box, and found the same
> At startup, my system logged "expired" requests - meaning reqs that were sent, but for
which no ACK, positive or negative, was seen from the broker, within 1.25x the requestAckTimeout
(ie, 2500ms).  I would let it settle until the unacked queue was stable at or around 0.
> What I found is this: ACKs are normally generated within milliseconds.  This was demonstrated
by my logging added to the scala ProducerRe* classes, and they are normally seen quickly by
my client.  But when the actual error occurs, namely that a request is ignored, the ProducerResponse
class *does* encode the correct correlationId; however, a response containing that ID is never
sent over the network, as evidenced by my tcpdump traces.  In my experience this would take
anywhere from 3-15 seconds to occur after the system was warm, meaning that it's 1 out of
several hundred on average that shows the condition.
> While I can't attach my client code, I could attach logs; but since my intention is to
share the code with LI people, I will wait to see if that's useful here.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message