Return-Path: X-Original-To: apmail-ignite-dev-archive@minotaur.apache.org Delivered-To: apmail-ignite-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7530018A2F for ; Tue, 17 Nov 2015 17:31:05 +0000 (UTC) Received: (qmail 88467 invoked by uid 500); 17 Nov 2015 17:31:05 -0000 Delivered-To: apmail-ignite-dev-archive@ignite.apache.org Received: (qmail 87872 invoked by uid 500); 17 Nov 2015 17:31:05 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 87644 invoked by uid 99); 17 Nov 2015 17:31:04 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Nov 2015 17:31:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 42D89C5ADF for ; Tue, 17 Nov 2015 17:31:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.899 X-Spam-Level: ** X-Spam-Status: No, score=2.899 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id gQ3oMeCWlZrL for ; Tue, 17 Nov 2015 17:31:00 +0000 (UTC) Received: from mail-io0-f177.google.com (mail-io0-f177.google.com [209.85.223.177]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id CDB1A441C0 for ; Tue, 17 Nov 2015 17:30:59 +0000 (UTC) Received: by iofh3 with SMTP id h3so26002791iof.3 for ; Tue, 17 Nov 2015 09:30:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=6gYNWeGmj0HNnaGDMvD6izecOpgAPMI9QKv6PloYNu0=; b=YPw/gXTcjRJlO+eEBBmsPPTX94VCEPTnBHNf3HEkjUpuHQavupKC1Qi8v63qLsfFXM EjZzwpdsLaXgUYjDP6BC0RXFQKeCQ8tYxIjQdvYt+Sw90efxrnYSvzMzWj69e12Su7M9 rPkOE1cd7BCS1zj+Uy0Gf+g4HeZPcCF6lUojD5YHqgYVUkx7rna8mZSHTe29foxX36K/ semNTZIAekp/g6NHPmPFt35UX22RIZXTdhSm2nM9gHpBOfpH96Z0eI9QVj8o4CdBhRCB +eNc9kErhL2+c70eQHWKTnxEBHxnP6LQqYhiiCStEf2pcP8z6q1DVLaZt28UoVqRurjE cP/A== MIME-Version: 1.0 X-Received: by 10.107.165.140 with SMTP id o134mr22668819ioe.118.1447781459410; Tue, 17 Nov 2015 09:30:59 -0800 (PST) Received: by 10.36.1.207 with HTTP; Tue, 17 Nov 2015 09:30:59 -0800 (PST) In-Reply-To: References: Date: Tue, 17 Nov 2015 18:30:59 +0100 Message-ID: Subject: Re: Ignite-1.5 Release From: Vladisav Jelisavcic To: dev@ignite.apache.org Content-Type: multipart/alternative; boundary=001a1141fb9099cced0524bfe302 --001a1141fb9099cced0524bfe302 Content-Type: text/plain; charset=UTF-8 Hi Yakov, 1. Yes 2. if you mean that nodeMap is accessed in onNodeRemoved(UUID nodeID) method of the GridCacheSemaphoreImpl class, it shouldn't be a problem, but it can be changed easily not to do so; 3. org.apache.ignite.internal.processors.cache.datastructures.GridCacheAbstractDataStructuresFailoverSelfTest#testSemaphoreConstantTopologyChangeFailoverSafe() org.apache.ignite.internal.processors.cache.datastructures.GridCacheAbstractDataStructuresFailoverSelfTest#testSemaphoreConstantMultipleTopologyChangeFailoverSafe() I think the problem is with the atomicity of the simulated grid failure; once stopGrid() is called for a node, other threads on this same node start throwing interrupted exceptions, which are in turn not handled properly in the GridCacheAbstractDataStructuresFailoverSelfTest; Those exceptions shouldn't be dealt with inside the GridCacheSemaphoreImpl itself. In a realworld node failure scenario, all those threads would fail at the same time (none of them would influence the rest of the grid anymore); I think fixing the issue Denis is working on can fix this (IGNITE-801 and IGNITE-803) Am i right? Does it makes sense? Best regards, Vladisav On Tue, Nov 17, 2015 at 5:40 PM, Yakov Zhdanov wrote: > Vladislav, > > I started to review the latest changes and have couple of questions: > > 1. latest changes are here - https://github.com/apache/ignite/pull/120? Is > that correct? > 2. > org.apache.ignite.internal.processors.datastructures.GridCacheSemaphoreImpl.Sync#nodeMap > is accessed in both sync and unsync context. Are you sure this is fine. > 3. As far as failing test - can you please isolate it into separate junit > or point out existing one? > > --Yakov > > 2015-11-11 12:33 GMT+03:00 Vladisav Jelisavcic : > > > Yakov, > > > > sorry for running a bit late. > > > > > Vladislav, do you have any updates for > > > https://issues.apache.org/jira/browse/IGNITE-638? Or any questions? > > > > > > --Yakov > > > > I have problems with some fail-over scenarios; > > It seems that if the two nodes are in the middle of acquiring or > releasing > > the semaphore, > > and one of them fails, all nodes get: > > > > [09:36:38,509][ERROR][ignite-#13%pub-null%][GridCacheSemaphoreImpl] > > Failed to compare and set: > > o.a.i.i.processors.datastructures.GridCacheSemaphoreImpl$Sync$1@5528b728 > > class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException: > > Failed to acquire lock for keys (primary node left grid, retry > transaction > > if possible) [keys=[UserKeyCacheObjectImpl [val=GridCacheInternalKeyImpl > > [name=ac83b8cb-3052-49a6-9301-81b20b0ecf3a], hasValBytes=true]], > > node=c321fcc4-5db5-4b03-9811-6a5587f2c253] > > ... > > Caused by: class > > org.apache.ignite.internal.cluster.ClusterTopologyCheckedException: > Failed > > to acquire lock for keys (primary node left grid, retry transaction if > > possible) [keys=[UserKeyCacheObjectImpl [val=GridCacheInternalKeyImpl > > [name=ac83b8cb-3052-49a6-9301-81b20b0ecf3a], hasValBytes=true]], > > node=c321fcc4-5db5-4b03-9811-6a5587f2c253] > > at > > > > > org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.newTopologyException(GridDhtColocatedLockFuture.java:1199) > > ... 10 more > > > > > > I'm still trying to find out how to exactly reproduce this behavior, > > I'll send you more details once I try few more things. > > > > I am still using partitioned cache, does it make sense to use replicated > > cache instead? > > > > > > Other than that, I'm done with everything else. > > > > Thanks, > > Vladisav > > > > > --001a1141fb9099cced0524bfe302--