From issues-return-80834-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Tue Nov 6 14:34:05 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 9F71D180658 for ; Tue, 6 Nov 2018 14:34:04 +0100 (CET) Received: (qmail 53442 invoked by uid 500); 6 Nov 2018 13:34:03 -0000 Mailing-List: contact issues-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list issues@ignite.apache.org Received: (qmail 53433 invoked by uid 99); 6 Nov 2018 13:34:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Nov 2018 13:34:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 5907E18AA6F for ; Tue, 6 Nov 2018 13:34:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id YXkn3xuKuHEP for ; Tue, 6 Nov 2018 13:34:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 32C2E5F48F for ; Tue, 6 Nov 2018 13:34:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 7A43EE00EA for ; Tue, 6 Nov 2018 13:34:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1ECA12669C for ; Tue, 6 Nov 2018 13:34:00 +0000 (UTC) Date: Tue, 6 Nov 2018 13:34:00 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@ignite.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (IGNITE-9840) Possible deadlock on transactional future on client node in case of network problems or long GC pauses MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/IGNITE-9840?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D166= 76770#comment-16676770 ]=20 ASF GitHub Bot commented on IGNITE-9840: ---------------------------------------- GitHub user ilantukh opened a pull request: https://github.com/apache/ignite/pull/5268 IGNITE-9840 =20 You can merge this pull request into a Git repository by running: $ git pull https://github.com/gridgain/apache-ignite ignite-9840 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/ignite/pull/5268.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5268 =20 ---- commit 8423138902a4e58f98b7e49bd6df19c07b07df88 Author: Ilya Lantukh Date: 2018-10-31T15:03:51Z IGNITE-9840 : Possible fix. commit 980689f57fca00ee0bbd675fd64d8328079ee1bd Author: Ilya Lantukh Date: 2018-11-06T13:27:18Z IGNITE-9840 : Cosmetic changes ---- > Possible deadlock on transactional future on client node in case of netwo= rk problems or long GC pauses > -------------------------------------------------------------------------= ----------------------------- > > Key: IGNITE-9840 > URL: https://issues.apache.org/jira/browse/IGNITE-9840 > Project: Ignite > Issue Type: Bug > Components: clients > Affects Versions: 2.6 > Reporter: Andrey Aleksandrov > Assignee: Alexey Stelmak > Priority: Critical > Fix For: 2.8 > > > Steps to reproduce: > 1)Start the server node with next timeouts. DefaultTxTimeout should be gr= eater than other: > =C2=A0 > {code:java} > > > > > > > > =C2=A0 =C2=A0 > =C2=A0 =C2=A0 =C2=A0 =C2=A0 > =C2=A0 =C2=A0 > > > > > {code} > On the server side you should create a cache=C2=A0with next parameters: > =C2=A0 > =C2=A0 > {code:java} > > =C2=A0 =C2=A0 > =C2=A0 =C2=A0 > =C2=A0 =C2=A0 > =C2=A0 =C2=A0 > =C2=A0 =C2=A0 > =C2=A0 =C2=A0 {code} > 2)After that start the client with the next code: > {code:java} > IgniteCache cache =3D ignite.getOrCreateCache("CACHE"); > try (Transaction tx =3D ignite.transactions().txStart()) { > cache.put("Key", new Object()); > System.out.println("Stop me"); > //here we will get long GC pause on server side > Thread.sleep(10000); > // Commit the transaction. > tx.commitAsync().get(); > } > {code} > =C2=A0 > On step "Stop me" you should suspend all the thread on the server=C2=A0si= de to emulate the networking=C2=A0problem or long GC pause on the server si= de. > Finally, you will face in client node next: > {code:java} > [2018-10-10 16:46:10,157][ERROR][nio-acceptor-tcp-comm-#28%GRIDC1%][root]= Critical system error detected. Will be handled accordingly to configured = handler [hnd=3DStopNodeOrHaltFailureHandler [tryStop=3Dfalse, timeout=3D0, = super=3DAbstractFailureHandler [ignoredFailureTypes=3DUnmodifiableSet [SYST= EM_WORKER_BLOCKED]]], failureCtx=3DFailureContext [type=3DSYSTEM_WORKER_BLO= CKED, err=3Dclass o.a.i.IgniteException: GridWorker [name=3Dgrid-timeout-wo= rker, igniteInstanceName=3DGRIDC1, finished=3Dfalse, heartbeatTs=3D15391790= 57570]]] > {code} > Also, the similar issue could be reproduced in 2.4. In both cases looks l= ike we have a deadlock during trying to display the=C2=A0TxEntryValueHolder= . Looks like this values are already used by the transaction=C2=A0with long= =C2=A0DefaultTxTimeout=C2=A0. > {code:java} > java.lang.Thread.State: WAITING > at sun.misc.Unsafe.park(Unsafe.java:-1) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) > at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutu= reAdapter.java:177) > at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutur= eAdapter.java:140) > at org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryPr= ocessorImpl.metadata0(CacheObjectBinaryProcessorImpl.java:526) > at org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryPr= ocessorImpl.metadata(CacheObjectBinaryProcessorImpl.java:510) > at org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryPr= ocessorImpl$2.metadata(CacheObjectBinaryProcessorImpl.java:193) > at org.apache.ignite.internal.binary.BinaryContext.metadata(BinaryContext= .java:1265) > at org.apache.ignite.internal.binary.BinaryUtils.type(BinaryUtils.java:24= 07) > at org.apache.ignite.internal.binary.BinaryObjectImpl.rawType(BinaryObjec= tImpl.java:302) > at org.apache.ignite.internal.binary.BinaryObjectExImpl.toString(BinaryOb= jectExImpl.java:205) > at org.apache.ignite.internal.binary.BinaryObjectExImpl.toString(BinaryOb= jectExImpl.java:186) > at org.apache.ignite.internal.binary.BinaryObjectImpl.toString(BinaryObje= ctImpl.java:919) > at java.lang.String.valueOf(String.java:2994) > at java.lang.StringBuilder.append(StringBuilder.java:131) > at org.apache.ignite.internal.processors.cache.transactions.TxEntryValueH= older.toString(TxEntryValueHolder.java:161) > ...{code} > On the client side, it could be looked like a hanging transaction because= we waiting on: > {code:java} > tx.commitAsync().get();{code} > =C2=A0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)