Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 16A3F200C21 for ; Mon, 20 Feb 2017 15:23:11 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 15352160B73; Mon, 20 Feb 2017 14:23:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3E85D160B62 for ; Mon, 20 Feb 2017 15:23:10 +0100 (CET) Received: (qmail 68690 invoked by uid 500); 20 Feb 2017 14:23:03 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 68680 invoked by uid 99); 20 Feb 2017 14:22:56 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Feb 2017 14:22:56 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id A789AC0258 for ; Mon, 20 Feb 2017 14:22:55 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.02 X-Spam-Level: X-Spam-Status: No, score=-0.02 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=data-artisans-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 4xjBM3BU7guE for ; Mon, 20 Feb 2017 14:22:53 +0000 (UTC) Received: from mail-wm0-f52.google.com (mail-wm0-f52.google.com [74.125.82.52]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 30DD35F477 for ; Mon, 20 Feb 2017 14:22:53 +0000 (UTC) Received: by mail-wm0-f52.google.com with SMTP id c85so80862592wmi.1 for ; Mon, 20 Feb 2017 06:22:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=data-artisans-com.20150623.gappssmtp.com; s=20150623; h=from:content-transfer-encoding:mime-version:subject:date:references :to:in-reply-to:message-id; bh=wS0hB7qO8UYPgeXETy5rfdKPbRkc6zysGi/TKNoUG7E=; b=uSFscHrYpVHo7SCt/tJm/tuVEq9LDUrRqzjtJydrbE+ktabS5NtIgXXDuIWteTy0b3 wSt83E9y+dzqhj3tT1ZksY8LbEy3uIKsa/t2/FjhPPpNNwww7pZ1Q8oHJxNNQT+h3t5y ZNbGWv9+9WXYWXiKa5pbg7FWZ1H/Xdo+8xY5PhzI5QlzIWh9D7HEa72MDLqkpqZ0On5j 5qfyU+cmu9sBhpRCQ1DGW2lEtqT6LnDsDOAIsyJs95UHyRcQoJKfv2VYqEb6vPC5y4NL T9PdaaT/ke5F8TLATwKe86bHQEWQlvOK2q9RuN/qHAYAvChGe7sL/EdsrZ22ZXh+LzYW adrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:content-transfer-encoding:mime-version :subject:date:references:to:in-reply-to:message-id; bh=wS0hB7qO8UYPgeXETy5rfdKPbRkc6zysGi/TKNoUG7E=; b=fVTWqsaYHDC/kRD+/v7CojbT0GnhBdrdZ6vKynvvGRi9W8nPl3q00RL8XuZFpqpdNl tg6NzRHCUH7Rma0QF4K4l5iChxwpvHNwM/MZs135IzoUXbThEX2d8LVeVGopfMEFYWdj 9O2fcrZDI82yMKXq+zucvbxY84ukU10kJ2ApOO+VAEnQGe6cCg5aECeBFivAeQSUz9PP 1YIrZx9QyPSmuoXoJh1KcDQIwnYDoegt1Y9RLWz3C9uv43CZxGFtfG6/BoTmpkB8VCiz Dgp8wQ58LhTg4RknJa6CPB7AkISEPYFoJVKErZO8vWUkpb47GJ7EMcm+oqUdpSEIgUgB SsdQ== X-Gm-Message-State: AMke39khGiylVOWXRrf6HEjd9oZ5f8w+GIBxsX00phXfK5EUn18lphSWmHg3m5rfTYBTRKtR X-Received: by 10.28.181.145 with SMTP id e139mr18168792wmf.127.1487600570734; Mon, 20 Feb 2017 06:22:50 -0800 (PST) Received: from skynet.fritz.box (ipservice-092-219-046-010.092.219.pools.vodafone-ip.de. [92.219.46.10]) by smtp.gmail.com with ESMTPSA id o2sm24949367wra.42.2017.02.20.06.22.49 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 Feb 2017 06:22:50 -0800 (PST) From: Stefan Richter Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: Previously working job fails on Flink 1.2.0 Date: Mon, 20 Feb 2017 15:22:49 +0100 References: <681a0422-0ede-cf4c-02d9-f58f09044052@hausmann-family.de> To: user@flink.apache.org In-Reply-To: <681a0422-0ede-cf4c-02d9-f58f09044052@hausmann-family.de> Message-Id: X-Mailer: Apple Mail (2.3259) archived-at: Mon, 20 Feb 2017 14:23:11 -0000 Hi, Flink 1.2 is partitioning all keys into key-groups, the atomic units for = rescaling. This partitioning is done by hash partitioning and is also in = sync with the routing of tuples to operator instances (each parallel = instance of a keyed operator is responsible for some range of key = groups). This exception means that Flink detected a tuple in the state = backend of a parallel operator instance that should not be there = because, by its key hash, it belongs to a different key-group. Or = phrased differently, this tuple belongs to a different parallel operator = instance. If this is a Flink bug or user code bug is very hard to tell, = the log also does not provide additional insights. I could see this = happen in case that your keys are mutable and your code makes some = changes to the object that change the hash code. Another question is = also: did you migrate your job from Flink 1.1.3 through an old savepoint = or did you do a fresh start. Other than that, I can recommend to check = your code for mutating of keys. If this fails deterministically, you = could also try to set a breakpoint for the line of the exception and = take a look if the key that is about to be inserted is somehow special. Best, Stefan=20 > Am 20.02.2017 um 14:32 schrieb Steffen Hausmann = : >=20 > Hi there, >=20 > I=E2=80=99m having problems running a job on Flink 1.2.0 that = successfully executes on Flink 1.1.3. The job is supposed to read events = from a Kinesis stream and to send outputs to Elasticsearch and it = actually initiates successfully on a Flink 1.2.0 cluster running on = YARN, but as soon as I start to ingest events into the Kinesis stream, = the job fails (see the attachment for more information): >=20 > java.lang.RuntimeException: Unexpected key group index. This indicates = a bug. >=20 > at = org.apache.flink.runtime.state.heap.StateTable.set(StateTable.java:57) >=20 > at = org.apache.flink.runtime.state.heap.HeapListState.add(HeapListState.java:9= 8) >=20 > at = org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.proc= essElement(WindowOperator.java:372) >=20 > at = org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(St= reamInputProcessor.java:185) >=20 > at = org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputSt= reamTask.java:63) >=20 > at = org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java= :272) >=20 > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:655) >=20 > at java.lang.Thread.run(Thread.java:745) >=20 > Any ideas what=E2=80=99s going wrong here? The job executes = successfully when it=E2=80=99s compiled against the Flink 1.1.3 = artifacts and run on a Flink 1.1.3 cluster. Does this indicate a bug in = my code or is this rather a bug in Flink? How can I further debug this? >=20 > Any guidance is highly appreciated. >=20 > Thanks, >=20 > Steffen >=20 >