kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sriram Subramanian (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (KAFKA-706) broker appears to be encoding ProduceResponse, but never sending it
Date Fri, 08 Feb 2013 00:07:13 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sriram Subramanian reassigned KAFKA-706:
----------------------------------------

    Assignee: Sriram Subramanian
    
> broker appears to be encoding ProduceResponse, but never sending it
> -------------------------------------------------------------------
>
>                 Key: KAFKA-706
>                 URL: https://issues.apache.org/jira/browse/KAFKA-706
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>         Environment: reproduced on both Mac OS and RH linux, via private node.js client
>            Reporter: ben fleis
>            Assignee: Sriram Subramanian
>
> By all appearances, I seem to be able to convince a broker to periodically encode, but
never transmit, a ProduceResponse.  Unfortunately my client is proprietary, but I will share
it with Neha via LI channels.  But I will describe what's going on in the hopes that there's
another trivial way to reproduce it.  (I did search through JIRA, and haven't found anything
that looks like this.)
> I am running a single instance zookeeper and single broker.  I have a client that generates
configurable amounts of data, tracking what is produced (both sent and ACK'd), and what is
consumed.  I was noticing that when using high transfer rates via high frequency single messages,
my unack'd queue appeared to be getting continuously larger.  So, I outfitted my client to
log more information about correlation ids at various stages, and modified the kafka ProducerRequest/ProducerResponse
to log (de)serialization of the same.  I then used tcpdump to intercept all communications
between my client and the broker.  Finally, I configured my client to generate 1 message per
~10ms, each payload being approximately 33 bytes; requestAckTimeout was set to 2000ms, and
requestAcksRequired was set to 1.  I used 10ms as I found that 5ms or less caused my unacked
queue to build up due to system speed -- it simply couldn't keep up.  10ms keeps the load
high, but just manageable.  YMMV with that param.  All of this is done on a single host, over
loopback.  I ran it on both my airbook, and a well setup RH linux box, and found the same
problem.
> At startup, my system logged "expired" requests - meaning reqs that were sent, but for
which no ACK, positive or negative, was seen from the broker, within 1.25x the requestAckTimeout
(ie, 2500ms).  I would let it settle until the unacked queue was stable at or around 0.
> What I found is this: ACKs are normally generated within milliseconds.  This was demonstrated
by my logging added to the scala ProducerRe* classes, and they are normally seen quickly by
my client.  But when the actual error occurs, namely that a request is ignored, the ProducerResponse
class *does* encode the correct correlationId; however, a response containing that ID is never
sent over the network, as evidenced by my tcpdump traces.  In my experience this would take
anywhere from 3-15 seconds to occur after the system was warm, meaning that it's 1 out of
several hundred on average that shows the condition.
> While I can't attach my client code, I could attach logs; but since my intention is to
share the code with LI people, I will wait to see if that's useful here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message