kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Charles Crain (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-6249) Interactive query downtime when node goes down even with standby replicas
Date Tue, 21 Nov 2017 14:47:00 GMT
Charles Crain created KAFKA-6249:
------------------------------------

             Summary: Interactive query downtime when node goes down even with standby replicas
                 Key: KAFKA-6249
                 URL: https://issues.apache.org/jira/browse/KAFKA-6249
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 1.0.0
            Reporter: Charles Crain


In a multi-node Kafka Streams application that uses interactive queries, the queryable store
will become unavailable (throw InvalidStateStoreException) for up to several minutes when
a node goes down.  This happens regardless of how many nodes are in the application as well
as how many standby replicas are configured.

My expectation is that if a standby replica is present, that the interactive query would fail
over to the live replica immediately causing negligible downtime for interactive queries.
 Instead, what appears to happen is that the queryable store is down for however long it takes
for the nodes to completely rebalance (this takes a few minutes for a couple GB of total data
in the queryable store's backing topic).

I am filing this as a bug, realizing that it may in fact be a feature request.  However, until
there is a way we can use interactive queries with minimal (~zero) downtime on node failure,
we are having to entertain other strategies for serving queries (e.g. manually materializing
the topic to an external resilient store such as Cassandra) in order to meet our SLAs.

If there is a way to minimize the downtime of interactive queries on node failure that I am
missing, I would like to know what it is.

Our team is super-enthusiastic about Kafka Streams and we're keen to use it for just about
everything!  This is out only major roadblock.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message