kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apurva Mehta (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (KAFKA-4558) throttling_test fails if the producer starts too fast.
Date Sat, 07 Jan 2017 02:01:02 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15806421#comment-15806421
] 

Apurva Mehta edited comment on KAFKA-4558 at 1/7/17 2:00 AM:
-------------------------------------------------------------

So I had a look at the code. All the 13 tests which use `ProduceConsumeValidate` have changed
since that commit. So it is totally unproductive revert that change at this point.

Regarding your proposal for two metrics: partitions assigned and per-partition lag may not
be what we want. Particularly, in the`ProduceConsumeValidate` test, the producer is started
after the consumer. So if the topic is originally empty, or if the consumer is configured
to read from the end, the lag will always be zero. This is per my understanding of how lag
is reported, viz. how far from the tail of the log the consumer is. So the lag metric probably
won't be very useful in majority of the cases. 

But waiting until partitions assigned is non zero may be what we want. The tests I have seen
just have a single console consumer for the entire topic, so there should be enough partitions
to go around. Of course this may not be true in the future). At the very least it will be
better than what we have right now. And if there are not enough partitions to go around, the
test will fail early (since the wait_until will time out), and can be diagnosed before checkin.


Regarding implementation of partitions assigned alone, I thought it might be worth staging
the implementation by first using the metric through jmx. This would give us a shorter turn
around time and validate whether this approach is sufficient to fix the current issues. We
can even play with different metrics more quickly if necessary. 

Finally, would adding an HttpMetricsReporter necessitate a KIP?




was (Author: apurva):
So I had a look at the code. All the 13 tests which use `ProduceConsumeValidate` have changed
since that commit. So it is totally unproductive revert that change at this point.

Regarding your proposal for two metrics: partitions assigned and per-partition lag may not
be what we want. Particularly, in the`ProduceConsumeValidate` test, the producer is started
after the consumer. So if the topic is originally empty, or if the consumer is configured
to read from the end, the lag will always be zero. This is per my understanding of how lag
is reported, viz. how far from the tail of the log the consumer is. So the lag metric probably
won't be very useful in majority of the cases. 

But waiting until partitions assigned is non zero may be what we want. At the very least it
will be better than what we have right now.

Regarding implementation of partitions assigned alone, I thought it might be worth staging
the implementation by first using the metric through jmx. This would give us a shorter turn
around time and validate whether this approach is sufficient to fix the current issues. We
can even play with different metrics more quickly if necessary. 

Finally, would adding an HttpMetricsReporter necessitate a KIP?



> throttling_test fails if the producer starts too fast.
> ------------------------------------------------------
>
>                 Key: KAFKA-4558
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4558
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Apurva Mehta
>            Assignee: Apurva Mehta
>
> As described in https://issues.apache.org/jira/browse/KAFKA-4526, the throttling test
will fail if the producer in the produce-consume-validate loop starts up before the consumer
is fully initialized.
> We need to block the start of the producer until the consumer is ready to go. 
> The current plan is to poll the consumer for a particular metric (like, for instance,
partition assignment) which will act as a good proxy for successful initialization. Currently,
we just check for the existence of a process with the PID, which is not a strong enough check,
causing the test to fail intermittently. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message