kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ismael Juma (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KAFKA-4825) Likely Data Loss in ReassignPartitionsTest System Test
Date Sun, 05 Mar 2017 19:52:32 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ismael Juma updated KAFKA-4825:
-------------------------------
    Labels: reliability  (was: )

> Likely Data Loss in ReassignPartitionsTest System Test
> ------------------------------------------------------
>
>                 Key: KAFKA-4825
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4825
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Ben Stopford
>              Labels: reliability
>         Attachments: problem.zip
>
>
> A failure in the below test may imply to a genuine missing message. 
> kafkatest.tests.core.reassign_partitions_test.ReassignPartitionsTest.test_reassign_partitions.bounce_brokers=True.security_protocol=PLAINTEXT
> The test - which reassigns partition whilst bouncing cluster members - reconciles messages
ack'd with messages received in the consumer. 
> The interesting part is that we received two ack's for the same offset, with different
messages:
> {"topic":"test_topic","partition":11,"name":"producer_send_success","value":"7447","time_ms":1488349980718,"offset":372,"key":null}
> {"topic":"test_topic","partition":11,"name":"producer_send_success","value":"7487","time_ms":1488349981780,"offset":372,"key":null}
> When searching the log files, via kafka.tools.DumpLogSegments, only the later message
is found. 
> The missing message lies midway through the test and appears to occur after a leader
moves (after 7447 is sent there is a ~1s pause, then 7487 is sent, along with a backlog of
messages for partitions 11, 16, 6). 
> The overall implication is a message appears to be acknowledged but later lost. 
> Looking at the test itself it seems valid. The producer is initialised with acks = -1.
The callback checks for an exception in the onCompletion callback and uses this to track acknowledgement
in the test. 
> https://jenkins.confluent.io/job/system-test-kafka/521/console
> http://testing.confluent.io/confluent-kafka-system-test-results/?prefix=2017-03-01--001.1488363091--apache--trunk--c9872cb/ReassignPartitionsTest/test_reassign_partitions/bounce_brokers=True.security_protocol=PLAINTEXT/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message