Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 54086200C89 for ; Sat, 3 Jun 2017 19:47:45 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 478F4160BCD; Sat, 3 Jun 2017 17:47:45 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8CB38160BB5 for ; Sat, 3 Jun 2017 19:47:44 +0200 (CEST) Received: (qmail 53742 invoked by uid 500); 3 Jun 2017 17:47:43 -0000 Mailing-List: contact user-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@ignite.apache.org Delivered-To: mailing list user@ignite.apache.org Received: (qmail 53708 invoked by uid 99); 3 Jun 2017 17:47:42 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Jun 2017 17:47:42 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 9A8A51858C8 for ; Sat, 3 Jun 2017 17:47:42 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.174 X-Spam-Level: ** X-Spam-Status: No, score=2.174 tagged_above=-999 required=6.31 tests=[DKIM_ADSP_CUSTOM_MED=0.001, NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_SOFTFAIL=0.972, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id RhecD5OkVsvZ for ; Sat, 3 Jun 2017 17:47:40 +0000 (UTC) Received: from mwork.nabble.com (mwork.nabble.com [162.253.133.43]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 56E1E5FDA2 for ; Sat, 3 Jun 2017 17:47:40 +0000 (UTC) Received: from static.162.255.23.37.macminivault.com (unknown [162.255.23.37]) by mwork.nabble.com (Postfix) with ESMTP id 5ED3747052D62 for ; Sat, 3 Jun 2017 10:47:38 -0700 (MST) Date: Sat, 3 Jun 2017 10:47:38 -0700 (MST) From: Chris Berry To: user@ignite.apache.org Message-ID: <1496512058377-13357.post@n6.nabble.com> Subject: Ignite failing catastrophically MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable archived-at: Sat, 03 Jun 2017 17:47:45 -0000 Hi, I have a big problem. Ignite is failing catastrophically for me.=20 This is the scenario; We start a Cluster of 15 Ignite Server Nodes. These are initially empty. Then some Kafka feeds are enabled that streams data into 4 independent caches -- simultaneously (using DataStreamers) Each cache is configured with 1 primary and 2 backups =E2=80=93 and as a PA= RTITIONED cache. These attempt to load ~0.5M entries into each cache. These Kafka feeds are streamed from a Client Node on 4 Threads into the caches Almost always a Node will fail during this operation. And this will lead to a catastrophic, cascading failure of the entire Cluster. But on the failing Nodes, there is no information whatsoever as to what caused the failure. Nothing. No OOM. No Exceptions. Nothing. The logs simply stop. I have GC logging enabled, and there are no long pauses. Thus, I am baffled I have tried increasing memory.=20 I have tried increasing timeouts to ridiculous numbers; ``` COMPUTE_TASK_TIMEOUT=3D5000 DISCOVERY_ACK_TIMEOUT=3D30000 DISCOVERY_JOIN_TIMEOUT=3D120000 DISCOVERY_MAX_ACK_TIMEOUT=3D37000 DISCOVERY_NETWORK_TIMEOUT=3D120000 FAILURE_DETECTION_TIMEOUT=3D120000 IGNITE_LOG_LEVEL=3DINFO IGNITE_LONG_OPERATIONS_DUMP_TIMEOUT=3D200000 IGNITE_QUIET=3Dfalse ``` But nothing helps. What can I do to get better information out of Ignite?? It is basically failing silently. Is there some tuning parameters that I am missing? I would be happy to supply further config information. This is with Ignite 2.0.0 We have invested quite a bit of effort to get Ignite running for our application. And this is a show-stopper for us. NOTE: this does not happen with the smaller feeds that we have in our dev environment. Thanks,=20 -- Chris=20 -- View this message in context: http://apache-ignite-users.70518.x6.nabble.co= m/Ignite-failing-catastrophically-tp13357.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.