kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jiangjie Qin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-5621) The producer should retry expired batches when retries are enabled
Date Thu, 27 Jul 2017 21:47:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103967#comment-16103967

Jiangjie Qin commented on KAFKA-5621:

[~apurva] I am trying to understand the following statement
On the other hand, for an application, partitions are not really independent (and especially
so if you use transactions). If one partition is down, it makes sense to wait for it to be
ready before continuing. So we would want to handle as many errors internally as possible.
It would mean blocking sends once the queue is too large and not expiring batches in the queue.
This simplifies the application programming model.

Is it really different from applications and MM when a partition cannot make progress? It
seems in both cases the users would want to know that at some point and handle it? I think
retries are also for this purpose, otherwise we may block forever. If I understand right,
what this ticket is proposing is just to extend the batch expiration time from request.timeout.ms
to request.timeout.ms * reties. And KIP-91 proposes having an additional explicit configuration
for that batch expiration time instead of deriving it from request timeout. They seem not
quite different except that KIP-91 decouples the configurations from each other.

KAFKA-5494 is a good improvement. Regarding the error/anomaly handling, If we are willing
to make public interface changes given the next release would be 1.0.0, I am thinking of the
following configurations:
1. request.timeout.ms - needed for wire timeout
2. expiry.ms - the expiration time for a message, this is an approximate time to expire a
message if it cannot be sent out for whatever reason after it is ready for sending (the batch
is ready). In the worst case a message would be expired in (expiry.ms + request.timeout.ms)
after that message is ready for sending (note that user defines when the message is ready
for sending by specifying linger.ms and batch.size). expiry.ms should be longer than request.timeout.ms,
e.g. 2x or 3x.

The following configs are optional and will be decided by the producer if not specified:
3. min.retries - When this config is specified, the producer will at least retry for min.retries
times even if that will cause the message stay in the producer longer than expiry.ms. This
is to avoid the case that the producer cannot even retry at least once. When retry, the producer
will do exponential backoff internally. This could be default to 1.

Hopefully this gives us a cleaner configuration set for the producer.

> The producer should retry expired batches when retries are enabled
> ------------------------------------------------------------------
>                 Key: KAFKA-5621
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5621
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Apurva Mehta
>             Fix For: 1.0.0
> Today, when a batch is expired in the accumulator, a {{TimeoutException}} is raised to
the user.
> It might be better the producer to retry the expired batch rather up to the configured
number of retries. This is more intuitive from the user's point of view. 
> Further the proposed behavior makes it easier for applications like mirror maker to provide
ordering guarantees even when batches expire. Today, they would resend the expired batch and
it would get added to the back of the queue, causing the output ordering to be different from
the input ordering.

This message was sent by Atlassian JIRA

View raw message