kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Gustafson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-4689) OffsetValidationTest fails validation with "Current position greater than the total number of consumed records"
Date Thu, 30 Mar 2017 18:13:41 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949526#comment-15949526
] 

Jason Gustafson commented on KAFKA-4689:
----------------------------------------

I think I finally see what is happening here. I looked at this test case: http://confluent-systest.s3-website-us-west-2.amazonaws.com/confluent-kafka-0-10-2-system-test-results/?prefix=2017-03-30--001.1490888066--apache--0.10.2--2397269/.


Here is the assertion error:
{code}
AssertionError: Current position 30897 greater than the total number of consumed records 30892
{code}

We are missing 5 records apparently. Looking in the test logs, I found this warning:
{code}
[WARNING - 2017-03-30 11:01:19,615 - verifiable_consumer - _update_global_position - lineno:222]:
Expected next consumed offset of 20037 for partition TopicPartition(topic=u'test_topic', partition=0),
but instead saw 20042
{code}

This suggests a gap in the consumed data. However, in the event output, we see clearly that
these offsets were consumed:
{code}
{"timestamp":1490871669110,"count":5,"name":"records_consumed","partitions":[{"topic":"test_topic","partition":0,"count":5,"minOffset":20037,"maxOffset":20041}]}
{code}

And from the consumer log, we can see the offset is committed, though we do not see the offset
commit in the event output:
{code}
[2017-03-30 11:01:09,110] TRACE Sending OffsetCommit request with {test_topic-0=OffsetAndMetadata{offset=20042,
metadata=''}} to coordinator worker9:9092 (id: 2147483646 rack: null) for group test_group_id
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
{code}

We can conclude that both the "records_consumed" event and the "offsets_committed" event were
not delivered to the test framework prior to the hard failure. In general, it seems difficult
to write reliable hard failure test cases which depends on event output from the process experiencing
the hard failure. There is always a delay between the occurrence of the event and its emission
and delivery. We probably need to weaken the test assertions to deal with the possibility
of having missed events, but it's possible that the test is longer be useful if we do so.
For the time being, I think we should disable the hard failure scenarios.

> OffsetValidationTest fails validation with "Current position greater than the total number
of consumed records"
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-4689
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4689
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer, system tests
>            Reporter: Ewen Cheslack-Postava
>            Assignee: Jason Gustafson
>              Labels: system-test-failure
>
> {quote}
> ====================================================================================================
> test_id:    kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_bounce.clean_shutdown=False.bounce_mode=all
> status:     FAIL
> run time:   1 minute 49.834 seconds
>     Current position greater than the total number of consumed records
> Traceback (most recent call last):
>   File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
line 123, in run
>     data = self.run_test()
>   File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
line 176, in run_test
>     return self.test_context.function(self.test)
>   File "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
line 321, in wrapper
>     return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/client/consumer_test.py",
line 157, in test_consumer_bounce
>     "Current position greater than the total number of consumed records"
> AssertionError: Current position greater than the total number of consumed records
> {quote}
> See also https://issues.apache.org/jira/browse/KAFKA-3513?focusedCommentId=15791790&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15791790
which is another instance of this bug, which indicates the issue goes back at least as far
as 1/17/2017. Note that I don't think we've seen this in 0.10.1 yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message