From user-return-27682-archive-asf-public=cust-asf.ponee.io@flink.apache.org Wed May 15 15:29:49 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id EDA3A180621 for ; Wed, 15 May 2019 17:29:48 +0200 (CEST) Received: (qmail 55241 invoked by uid 500); 15 May 2019 15:29:46 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 55231 invoked by uid 99); 15 May 2019 15:29:46 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 May 2019 15:29:46 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id BE841180E46 for ; Wed, 15 May 2019 15:29:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2 X-Spam-Level: ** X-Spam-Status: No, score=2 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=ververica-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id CYqMgg3mt6f5 for ; Wed, 15 May 2019 15:29:44 +0000 (UTC) Received: from mail-vs1-f49.google.com (mail-vs1-f49.google.com [209.85.217.49]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id F42145FDB8 for ; Wed, 15 May 2019 15:29:43 +0000 (UTC) Received: by mail-vs1-f49.google.com with SMTP id q13so219135vso.2 for ; Wed, 15 May 2019 08:29:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ververica-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=xPrUL/FcdbiycF0NeAi1lNhi/FMnHXEZ0h1KxT3gvmg=; b=OLpVNaYTuZPv2RkAQXFf3Sw6gsl4D3IqJX0VJ88XbRi4kpP4tLTyrN6IlNfUzgIK3O qdTZNOSuAIG3zs5rAhnASBCgpo59RH+01ohNsKhvWkXJgah+52+X3k90UEzVk+cgFLXL CJ2uGc+Kwhjk2aW2zOUehFzfCg51Ne7AdfDAyBgaUA2j8FTYKZPa0dRLgTeA4YrMCTeT HRZ/BKfp3I16cgjrxHSDIX3U64a9Pou1eBnoVxO93I7VLjhsVaHxB7WXcnkxl5h/f+Zw XjdIAcyP7418CTzPFhD07RSoH9TJl9VHD4WktG6zOeKyXLqRWa5fiFHukRYSYuHUgxnd SaHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=xPrUL/FcdbiycF0NeAi1lNhi/FMnHXEZ0h1KxT3gvmg=; b=L6/DJe6xHCvV4eQ5K9xcORTI7vPXadRelfO5b8v+g2+cKkqE+YqueyvPcbdL/fRLSs fL+vZLOoXpiZ19WGbTDSIdWHtkat6LqBl7g6gtf2Y7rz9fxsI8GKetj6Cu7FMAPR7ft7 z4cGBvHdIiFz36/2dGTP1UghKuEOkWDWlerygYV2GFs/sv3mXkJ5OYIO5xeoMvMJIsQM fhTO+rNrXVqXH+DTf0mZBkyuM1Jq+MMwlmotxG0kY9YWune/h1BJ7NCt0cwfl3jMB9zx 7CSLvD6ZuXQ9LXori/x/SD6k2+J2KKiA/ucDpJjRgaYmgX000UUnxyUHrTcaiPbyuIPF LnVw== X-Gm-Message-State: APjAAAWBxDrBLUeIA4ksPhOUIqkjy1gXetwxL0Nff55LEwYC7kodKHHm U4GSNwcN4XJ80Ar2iWwI+atPfjCQv0pqUxR60b58wA== X-Google-Smtp-Source: APXvYqwIq8BQ3OpnAVw62XqH6uc2xW04ccVAUSQ57fYpO+xMsB1GzOOyCRo4U+DpQP+dbMHdlxwaONV0APekIbaxMQA= X-Received: by 2002:a67:ebd6:: with SMTP id y22mr5620950vso.87.1557934183464; Wed, 15 May 2019 08:29:43 -0700 (PDT) MIME-Version: 1.0 References: <1557891786836-0.post@n4.nabble.com> In-Reply-To: <1557891786836-0.post@n4.nabble.com> From: Andrey Zagrebin Date: Wed, 15 May 2019 17:29:32 +0200 Message-ID: Subject: Re: flink 1.4.2. java.lang.IllegalStateException: Could not initialize operator state backend To: anaray Cc: user Content-Type: multipart/alternative; boundary="00000000000096d9900588eed133" --00000000000096d9900588eed133 Content-Type: text/plain; charset="UTF-8" Hi, I am not sure that FLINK-8836 is related to the failure in the stack trace. You say you are using Flink in production, does it mean it always worked and has started to fail recently? From the stack trace, it looks like the arity of some Tuple type changed in some operator state. The number of tuple fields could have increased after job restart. In that case Flink expects tuples with more fields stored in checkpoint and fails. Such change would require an explicit state migration. Could it be the case? When did the failure start to happen and why the operator state was restored? Job restart? Best, Andrey --00000000000096d9900588eed133 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

I am not sure that=C2=A0FLINK-8836=C2=A0is related to the failure in the stack trac= e.

You say you are using Flink in production, does= it mean it always worked and has started to fail recently?

<= /div>
From the stack trace, it looks like the arity of some Tuple type = changed in some operator state. The number of tuple fields could have incre= ased after job restart. In that case Flink expects tuples with more fields = stored in checkpoint and fails. Such change would require an explicit state= migration. Could it be the case? When did the failure start to happen and = why the operator state was restored? Job restart?

= Best,
Andrey
--00000000000096d9900588eed133--