kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Whalen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-7941) Connect KafkaBasedLog work thread terminates when getting offsets fails because broker is unavailable
Date Mon, 18 Feb 2019 01:01:08 GMT
Paul Whalen created KAFKA-7941:
----------------------------------

             Summary: Connect KafkaBasedLog work thread terminates when getting offsets fails
because broker is unavailable
                 Key: KAFKA-7941
                 URL: https://issues.apache.org/jira/browse/KAFKA-7941
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 2.0.0
            Reporter: Paul Whalen
            Assignee: Paul Whalen


My team has run into this Connect bug regularly in the last six months while doing infrastructure
maintenance that causes intermittent broker availability issues.  I'm a little surprised it
exists given how routinely it affects us, so perhaps someone in the know can point out if
our setup is somehow just incorrect.  My team is running 2.0.0 on both the broker and client,
though from what I can tell from reading the code, the issue continues to exist through 2.2;
at least, I was able to write a failing unit test that I believe reproduces it.

When a {{KafkaBasedLog}} worker thread in the Connect runtime calls {{readLogToEnd}} and brokers
are unavailable, the {{TimeoutException}} from the consumer {{endOffsets}} call is uncaught
all the way up to the top level {{catch (Throwable t)}}, effectively killing the thread until
restarting Connect.  The result is Connect stops functioning entirely, with no indication
except for that log line - tasks still show as running.

The proposed fix is to simply catch and log the {{TimeoutException}}, allowing the worker
thread to retry forever.

Alternatively, perhaps there is not an expectation that Connect should be able to recover
following broker unavailability, though that would be disappointing.  I would at least hope
hope for a louder failure then the single {{ERROR}} log.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message