From user-return-17906-archive-asf-public=cust-asf.ponee.io@ignite.apache.org  Tue Feb 20 22:33:51 2018
Return-Path: <user-return-17906-archive-asf-public=cust-asf.ponee.io@ignite.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 90123180654
	for <archive-asf-public@cust-asf.ponee.io>; Tue, 20 Feb 2018 22:33:50 +0100 (CET)
Received: (qmail 57408 invoked by uid 500); 20 Feb 2018 21:33:49 -0000
Mailing-List: contact user-help@ignite.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@ignite.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@ignite.apache.org>
List-Post: <mailto:user@ignite.apache.org>
List-Id: <user.ignite.apache.org>
Reply-To: user@ignite.apache.org
Delivered-To: mailing list user@ignite.apache.org
Received: (qmail 57398 invoked by uid 99); 20 Feb 2018 21:33:49 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Feb 2018 21:33:49 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id C01901A06D4
	for <user@ignite.apache.org>; Tue, 20 Feb 2018 21:33:48 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: -0.001
X-Spam-Level:
X-Spam-Status: No, score=-0.001 tagged_above=-999 required=6.31
	tests=[SPF_HELO_PASS=-0.001] autolearn=disabled
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024)
	with ESMTP id TykjGHhC6naA for <user@ignite.apache.org>;
	Tue, 20 Feb 2018 21:33:46 +0000 (UTC)
Received: from n6.nabble.com (n6.nabble.com [162.255.23.37])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 7C0B95F1B3
	for <user@ignite.apache.org>; Tue, 20 Feb 2018 21:33:46 +0000 (UTC)
Received: from n6.nabble.com (localhost [127.0.0.1])
	by n6.nabble.com (Postfix) with ESMTP id E3C614D993C0
	for <user@ignite.apache.org>; Tue, 20 Feb 2018 14:33:45 -0700 (MST)
Date: Tue, 20 Feb 2018 14:33:45 -0700 (MST)
From: Dave Harvey <dharvey@jobcase.com>
To: user@ignite.apache.org
Message-ID: <1519162425930-0.post@n6.nabble.com>
In-Reply-To: <1518582253663-0.post@n6.nabble.com>
References: <CAG0uF_hYXevtr-e8CMoi_cqz+T-9QJg1z3+kEF9RUuDbXgLVXQ@mail.gmail.com> <1518518498682-0.post@n6.nabble.com> <CAG0uF_gVOBKmVovK1DUBSEhYWeJbQXww5SuNeCbLgKmEFz6Nhw@mail.gmail.com> <1518582253663-0.post@n6.nabble.com>
Subject: Re: 20 minute 12x throughput drop using data streamer and Ignite
 persistence
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

I've started reproducing this issue with more  statistics, but have not
reached the worst performance point yet, but somethings are starting to
become clearer:

The DataStreamer hashes the affinity key to partition, and then maps the
partition to a node, and fills a single buffer at a time for the node.  A
DataStreamer thread on the node therefore get a buffer's worth of requests
grouped by the time of the addData() call, with no per thread grouping by
affinity key (as I had originally assumed).

The test I was running was using a large amount of data where the average
number of keys for each unique affinity key is 3, with some outliers up to
50K.   One of the caches being updated in the optimistic transaction in the
StreamReceiver contains an object whose key is the affinity key, and whose
contents are the set of keys that have that affinity key.     We expect some
temporal locality for objects with the same affinity key.

We had a number of worker threads on a client node, but only one data
streamer, where we increased the buffer count.   Once we understood how the
data streamer actually worked, we made each worker have its own
DataStreamer.   This way, each worker could issue a flush, without affecting
the other workers.   That, in turn, allowed us to use smaller batches per
worker, decreasing the odds of temporal locality.

So it seems like we would get updates for the same affinity key on different
data streamer threads, and they could conflict updating the common record.  
The more keys per affinity key the more likely a conflict, and the more data
would need to be saved.   A flush operation could stall multiple workers,
and the flush operation might be dependent on requests that are conflicting.    

We chose to use OPTIMISTIC transactions because of their lack-of-deadlock
characteristics, rather than because we thought there would be high
contention.      I do think this behavior suggests something sub-optimal
about the OPTIMISTIC lock implementation, because I see a dramatic decrease
in throughput, but not a dramatic increase in transaction restarts. 
"In OPTIMISTIC transactions, entry locks are acquired on primary nodes
during the prepare step,"  does not say anything about  the order that locks
are acquired.  Sorting the locks so there is a consistent order would avoid
deadlocks.   
If there are no deadlocks, then there could be n-1 restarts of the
transaction for each commit, where n is the number of data streamer threads.    
This is the old "thundering herd" problem, which can easily be made order n
by only allowing one of the waiting threads to proceed at a time.
 

--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/