From user-return-17906-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Tue Feb 20 22:33:51 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 90123180654 for ; Tue, 20 Feb 2018 22:33:50 +0100 (CET) Received: (qmail 57408 invoked by uid 500); 20 Feb 2018 21:33:49 -0000 Mailing-List: contact user-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@ignite.apache.org Delivered-To: mailing list user@ignite.apache.org Received: (qmail 57398 invoked by uid 99); 20 Feb 2018 21:33:49 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Feb 2018 21:33:49 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id C01901A06D4 for ; Tue, 20 Feb 2018 21:33:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.001 X-Spam-Level: X-Spam-Status: No, score=-0.001 tagged_above=-999 required=6.31 tests=[SPF_HELO_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id TykjGHhC6naA for ; Tue, 20 Feb 2018 21:33:46 +0000 (UTC) Received: from n6.nabble.com (n6.nabble.com [162.255.23.37]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 7C0B95F1B3 for ; Tue, 20 Feb 2018 21:33:46 +0000 (UTC) Received: from n6.nabble.com (localhost [127.0.0.1]) by n6.nabble.com (Postfix) with ESMTP id E3C614D993C0 for ; Tue, 20 Feb 2018 14:33:45 -0700 (MST) Date: Tue, 20 Feb 2018 14:33:45 -0700 (MST) From: Dave Harvey To: user@ignite.apache.org Message-ID: <1519162425930-0.post@n6.nabble.com> In-Reply-To: <1518582253663-0.post@n6.nabble.com> References: <1518518498682-0.post@n6.nabble.com> <1518582253663-0.post@n6.nabble.com> Subject: Re: 20 minute 12x throughput drop using data streamer and Ignite persistence MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I've started reproducing this issue with more statistics, but have not reached the worst performance point yet, but somethings are starting to become clearer: The DataStreamer hashes the affinity key to partition, and then maps the partition to a node, and fills a single buffer at a time for the node. A DataStreamer thread on the node therefore get a buffer's worth of requests grouped by the time of the addData() call, with no per thread grouping by affinity key (as I had originally assumed). The test I was running was using a large amount of data where the average number of keys for each unique affinity key is 3, with some outliers up to 50K. One of the caches being updated in the optimistic transaction in the StreamReceiver contains an object whose key is the affinity key, and whose contents are the set of keys that have that affinity key. We expect some temporal locality for objects with the same affinity key. We had a number of worker threads on a client node, but only one data streamer, where we increased the buffer count. Once we understood how the data streamer actually worked, we made each worker have its own DataStreamer. This way, each worker could issue a flush, without affecting the other workers. That, in turn, allowed us to use smaller batches per worker, decreasing the odds of temporal locality. So it seems like we would get updates for the same affinity key on different data streamer threads, and they could conflict updating the common record. The more keys per affinity key the more likely a conflict, and the more data would need to be saved. A flush operation could stall multiple workers, and the flush operation might be dependent on requests that are conflicting. We chose to use OPTIMISTIC transactions because of their lack-of-deadlock characteristics, rather than because we thought there would be high contention. I do think this behavior suggests something sub-optimal about the OPTIMISTIC lock implementation, because I see a dramatic decrease in throughput, but not a dramatic increase in transaction restarts. "In OPTIMISTIC transactions, entry locks are acquired on primary nodes during the prepare step," does not say anything about the order that locks are acquired. Sorting the locks so there is a consistent order would avoid deadlocks. If there are no deadlocks, then there could be n-1 restarts of the transaction for each commit, where n is the number of data streamer threads. This is the old "thundering herd" problem, which can easily be made order n by only allowing one of the waiting threads to proceed at a time. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/