flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: UnknownKvStateKeyGroupLocation
Date Tue, 16 May 2017 12:41:33 GMT
Hey Joe! This sounds odd... are there any failures (JobManager or
TaskManager) or leader elections being reported? You should see such
events in the JobManager/TaskManager logs.

On Tue, May 16, 2017 at 2:28 PM, Joe Olson <jo4243@outlook.com> wrote:
> When running Flink in high availability mode, I've been seeing a high number
> of UnknownKvStateKeyGroupLocation errors being returned when using queryable
> state calls.
> If I put a simple getKvState call into a loop executing every second, and
> call it repeatedly, sometimes I will get the expected results, sometimes I
> will get UnknownKvStateKeyGroupLocation thrown. This is not associated with
> a query timeout (network issue).
> From looking at the Flink source code, this problem stems from a failure of
> lookup.getKvStateServerAddress returning null. I know all the task managers
> are registering state with the job manager, because I see the "Key value
> state registered for job xx under name yy" messages in the job server log.
> Anything else I should be looking for? I have several jobs I am querying
> state on, and this seems isolated to only one. I've gone over very closely
> the difference between the jobs, but they all built from the same template.
> What would cause a lookup.getKvStateServerAddress to sometimes succeed, and
> sometimes to fail?

View raw message