From user-return-22732-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Tue Oct 30 07:21:42 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id E5A61180652 for ; Tue, 30 Oct 2018 07:21:41 +0100 (CET) Received: (qmail 79584 invoked by uid 500); 30 Oct 2018 06:21:40 -0000 Mailing-List: contact user-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@ignite.apache.org Delivered-To: mailing list user@ignite.apache.org Received: (qmail 79574 invoked by uid 99); 30 Oct 2018 06:21:40 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Oct 2018 06:21:40 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 334AFC08D5 for ; Tue, 30 Oct 2018 06:21:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.971 X-Spam-Level: X-Spam-Status: No, score=0.971 tagged_above=-999 required=6.31 tests=[RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_SOFTFAIL=0.972] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id mj1l17TTEj9I for ; Tue, 30 Oct 2018 06:21:37 +0000 (UTC) Received: from n6.nabble.com (n6.nabble.com [162.255.23.37]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 2D1405F343 for ; Tue, 30 Oct 2018 06:21:37 +0000 (UTC) Received: from n6.nabble.com (localhost [127.0.0.1]) by n6.nabble.com (Postfix) with ESMTP id E016AA40B6F9 for ; Mon, 29 Oct 2018 23:21:35 -0700 (MST) Date: Mon, 29 Oct 2018 23:21:35 -0700 (MST) From: Ray To: user@ignite.apache.org Message-ID: <1540880495866-0.post@n6.nabble.com> Subject: Create index got stuck and freeze whole cluster. MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I'm using a five nodes Ignite 2.6 cluster. When I try try to create index on table with10 million records using sql "create index on table(a,b,c,d)", the whole cluster freezes and prints the following log for 40 minutes. 2018-10-30T02:48:44,086][WARN ][exchange-worker-#162][GridDhtPartitionsExchangeFuture] Unable to await partitions release latch within timeout: ServerLatch [permits=4, pendingAcks=[20aa5929-3f26-4923-87a3-27b4f6d4f744, ec5be25e-6601-468c-9f0e-7ab7c8caa9e9, 45819b05-a338-4bc4-b104-f0c7567fd49d, cbb80db7-b342-4b97-ba61-97d57c194a1a], super=CompletableLatch [id=exchange, topVer=AffinityTopologyVersion [topVer=202, minorTopVer=1]]] I noticed one of the servers(log in server3.zip) is stuck in checkpoint process, and this server acts as coordinator in PME. In the log I see only 856610 pages needs to be flushed to disk, but the checkpoint takes 32 minutes to finish. While another node takes 7 minutes to finish writing 919060 pages to disk. Also the disk usage on the slow checkpoint server is not 100%. Here's the whole log file for 5 servers. server1.zip server2.zip server3.zip server4.zip server5.zip -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/