kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ismael Juma (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (KAFKA-4007) Improve fetch pipelining for low values of max.poll.records
Date Wed, 30 Nov 2016 09:43:59 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15708070#comment-15708070
] 

Ismael Juma edited comment on KAFKA-4007 at 11/30/16 9:43 AM:
--------------------------------------------------------------

I think it's worth clarifying that a new fetch request is sent if the previous data for a
partition is exhausted, so if the consumer is consuming from multiple partitions, then we
won't necessarily be waiting after every every request with max.poll.records=1. KAFKA-4405
is also relevant where the cost of prefetching after every `poll` seems to cause lower performance
for the streams benchmark (and KAFKA-4469 that reduces some of the overhead when max.poll.records
is smaller than the fetch size).


was (Author: ijuma):
I think it's worth clarifying that a new fetch request is sent if the previous data for a
partition is exhausted, so if the consumer is consuming from multiple partitions, then we
won't necessarily be waiting after every every request with max.poll.records. KAFKA-4405 is
also relevant where the cost of prefetching after every `poll` seems to cause lower performance
for the streams benchmark (and KAFKA-4469 that reduces some of the overhead when max.poll.records
is smaller than the fetch size).

> Improve fetch pipelining for low values of max.poll.records
> -----------------------------------------------------------
>
>                 Key: KAFKA-4007
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4007
>             Project: Kafka
>          Issue Type: Improvement
>          Components: consumer
>            Reporter: Jason Gustafson
>            Assignee: Mickael Maison
>
> Currently the consumer will only send a prefetch for a partition after all the records
from the previous fetch have been consumed. This can lead to suboptimal pipelining when max.poll.records
is set very low since the processing latency for a small set of records may be small compared
to the latency of a fetch. An improvement suggested by [~junrao] is to send the fetch anyway
even if we have unprocessed data buffered, but delay reading it from the socket until that
data has been consumed. Potentially the consumer can delay reading _any_ pending fetch until
it is ready to be returned to the user, which may help control memory better. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message