From issues-return-61121-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Sat Apr 7 08:03:05 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 51A2618064E for ; Sat, 7 Apr 2018 08:03:05 +0200 (CEST) Received: (qmail 84592 invoked by uid 500); 7 Apr 2018 06:03:04 -0000 Mailing-List: contact issues-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list issues@ignite.apache.org Received: (qmail 84583 invoked by uid 99); 7 Apr 2018 06:03:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Apr 2018 06:03:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 868B318074A for ; Sat, 7 Apr 2018 06:03:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.511 X-Spam-Level: X-Spam-Status: No, score=-109.511 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Tq2NVUAqrIlW for ; Sat, 7 Apr 2018 06:03:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 72BC95FB91 for ; Sat, 7 Apr 2018 06:03:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id B69FEE0AFD for ; Sat, 7 Apr 2018 06:03:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2DF2425634 for ; Sat, 7 Apr 2018 06:03:00 +0000 (UTC) Date: Sat, 7 Apr 2018 06:03:00 +0000 (UTC) From: "Joel Lang (JIRA)" To: issues@ignite.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (IGNITE-7812) Slow rebalancing in case of enabled persistence MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/IGNITE-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429252#comment-16429252 ] Joel Lang edited comment on IGNITE-7812 at 4/7/18 6:02 AM: ----------------------------------------------------------- I've found that this happens after page eviction begins for the cache. It seems to be related to file reads in PageMemoryImpl.acquirePage(), primarily as a result of the H2 query index being updated when new rows are received during rebalancing. The segment write lock is held during these read operations which dramatically slows down other operations. It was a crippling slowdown in my test case (50x or more) which was done on a development VM which uses a HDD, not a SSD. We would be looking at 1-2 days of rebalance time for a cache with 6 million entries and 2 indexes in that case. While we do plan to use a SSD for production, this is still an excessive slowdown. Fortunately, it seems as if this has already been fixed for 2.5, the disk read happens after the segment write lock is released now: [PageMemoryImpl.java#L750|https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/pagemem/PageMemoryImpl.java#L750] was (Author: langj): I've found that this happens after page eviction begins for the cache. It seems to be related to file reads in PageMemoryImpl.acquirePage(), primarily as a result of the H2 query index being updated when new rows are received during rebalancing. The segment write lock is held during these read operations which dramatically slows down other operations. It was a crippling slowdown in my test case (50x or more) which was done on a development VM which uses a HDD, not a SSD. We would be looking at 1-2 days of rebalance time for a cache with 6 million entries and 2 indexes in t hat case. Fortunately, it seems as if this has already been fixed for 2.5, the disk read happens after the segment write lock is released now: [PageMemoryImpl.java#L750|https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/pagemem/PageMemoryImpl.java#L750] > Slow rebalancing in case of enabled persistence > ----------------------------------------------- > > Key: IGNITE-7812 > URL: https://issues.apache.org/jira/browse/IGNITE-7812 > Project: Ignite > Issue Type: Task > Reporter: Alexey Goncharuk > Assignee: Pavel Kovalenko > Priority: Major > > A user reported that rebalancing take significantly larger amounts of time when persistence is enabled even in LOG_ONLY mode. > Need to investigate how the performance of rebalancing may be increased. > Also, it would be great to estimate the benefit of file transfer for rebalancing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)