kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dong Lin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KAFKA-6258) SSLTransportLayer should keep reading from socket until either the buffer is full or the socket has no more data
Date Wed, 22 Nov 2017 08:26:00 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dong Lin updated KAFKA-6258:
----------------------------
    Description: 
When consumer uses plaintext and there is remaining data in consumer's buffer, consumer.poll()
will read all data available from the socket buffer to consumer buffer. However, if consumer
uses ssl and there is remaining data, consumer.poll() may only read 16 KB (the size of SslTransportLayer.appReadBuffer)
from socket buffer. This will reduce efficient of consumer.poll() by asking user to call more
poll() to get the same amount of data. 

Furthermore, we observe that for users who naively sleep a constant time after each consumer.poll(),
some partition will lag behind after they switch from plaintext to ssl. Here is the explanation
why this can happen.

Say there are 1 partition of 1MB/sec and 9 partition of 32KB/sec. Leaders of these partitions
are all different and consumer is consuming these 10 partitions. Let's also assume that socket
read buffer size is large enough and consume sleeps 1 sec between consumer.poll(). 1 sec is
long enough for consumer to receive the FetchResponse back from broker.

- When consumer uses plaintext, each consumer.poll() will read all data from the socket buffer
and it means 1 MB data is read from each partition.

- When consumer uses ssl, each consumer.poll() is likely to find that there is some data available
in the memory. In this case consumer only reads 16 KB data from other sockets, particularly
the socket for the broker with the large partition. Then the throughput of the large partition
will be limited to 16KB/sec.

Arguably user should not sleep 1 sec if its consumer is lagging behind. But on Kafka dev side
it is nice to keep the previous behavior and optimize consumer.poll() to read as much data
from socket as possible.


  was:
When consumer uses plaintext and there is remaining data in consumer's buffer, consumer.poll()
will read all data available from the socket buffer to consumer buffer. However, if consumer
uses ssl and there is remaining data, consumer.poll() may only read 16 KB (the size of SslTransportLayer.appReadBuffer)
from socket buffer. This will reduce efficient of consumer.poll() by asking user to call more
poll() to get the same amount of data. 

Furthermore, we observe that for users who naively sleep a constant time after each consumer.poll(),
some partition will lag behind after they switch from plaintext to ssl. Here is the explanation
why this can happen.

Say there are 1 partition of 1MB/sec and 9 partition of 32KB/sec. Leaders of these partitions
are all different and consumer is consuming these 10 partitions. Let's also assume that socket
read buffer size is large enough and consume sleeps 1 sec between consumer.poll(). 1 sec is
long enough for consumer to receive the FetchResponse back from broker.

When consumer uses plaintext, each consumer.poll() will read all data from the socket buffer
and it means 1 MB data is read from each partition.

When consumer uses ssl, each consumer.poll() is likely to find that there is some data available
in the memory. In this case consumer only reads 16 KB data from other sockets, particularly
the socket for the broker with the large partition. Then the throughput of the large partition
will be limited to 16KB/sec.

Arguably user should not sleep 1 sec if its consumer is lagging behind. But on Kafka dev side
it is nice to keep the previous behavior and optimize consumer.poll() to read as much data
from socket as possible.



> SSLTransportLayer should keep reading from socket until either the buffer is full or
the socket has no more data
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6258
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6258
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Dong Lin
>            Assignee: Dong Lin
>
> When consumer uses plaintext and there is remaining data in consumer's buffer, consumer.poll()
will read all data available from the socket buffer to consumer buffer. However, if consumer
uses ssl and there is remaining data, consumer.poll() may only read 16 KB (the size of SslTransportLayer.appReadBuffer)
from socket buffer. This will reduce efficient of consumer.poll() by asking user to call more
poll() to get the same amount of data. 
> Furthermore, we observe that for users who naively sleep a constant time after each consumer.poll(),
some partition will lag behind after they switch from plaintext to ssl. Here is the explanation
why this can happen.
> Say there are 1 partition of 1MB/sec and 9 partition of 32KB/sec. Leaders of these partitions
are all different and consumer is consuming these 10 partitions. Let's also assume that socket
read buffer size is large enough and consume sleeps 1 sec between consumer.poll(). 1 sec is
long enough for consumer to receive the FetchResponse back from broker.
> - When consumer uses plaintext, each consumer.poll() will read all data from the socket
buffer and it means 1 MB data is read from each partition.
> - When consumer uses ssl, each consumer.poll() is likely to find that there is some data
available in the memory. In this case consumer only reads 16 KB data from other sockets, particularly
the socket for the broker with the large partition. Then the throughput of the large partition
will be limited to 16KB/sec.
> Arguably user should not sleep 1 sec if its consumer is lagging behind. But on Kafka
dev side it is nice to keep the previous behavior and optimize consumer.poll() to read as
much data from socket as possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message