From dev-return-31443-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Tue Feb 27 15:12:42 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id DF1B3180651 for ; Tue, 27 Feb 2018 15:12:41 +0100 (CET) Received: (qmail 8850 invoked by uid 500); 27 Feb 2018 14:12:40 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 8833 invoked by uid 99); 27 Feb 2018 14:12:40 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Feb 2018 14:12:40 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id AC06FC0A16 for ; Tue, 27 Feb 2018 14:12:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id pMAjm6nhDdli for ; Tue, 27 Feb 2018 14:12:37 +0000 (UTC) Received: from mail-it0-f47.google.com (mail-it0-f47.google.com [209.85.214.47]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 3AB1E5F568 for ; Tue, 27 Feb 2018 14:12:36 +0000 (UTC) Received: by mail-it0-f47.google.com with SMTP id c11so5940316ith.4 for ; Tue, 27 Feb 2018 06:12:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=hv+gdtUtn1orYFuVl4hjEXvL2tQeuLYfG9JmNSBNrkM=; b=ooRjUFmSWi/yFU4BUzud3BPrUjI2Vs4rB8PTivq0yc9iqy/Yz3LI1aQ0TB6xelCg2j h8nNyT6TENwjZnPcQe4ZH/iaf/jcGa2eDYF7RCSis05zbHC8Mh2qQ3RSV7cwRvbxuzWr E1dbd9DVdFARPVf6SVUNlhbkUCFvafxtz5PamF722wgVGANvpbjqkWSgzPM/iZxEHyPZ Psc/9sijDrwdDZ9s+CGsTSpCspaQ3SRGS0JS+hpZvuVtNePszf3qVIgquTrIM87ptlLf L6JxQW2szdWuJWC4aPAVLEzUF1HkjCX+tCpMbost7YznD3hgPVxBJKQbhucO8IGG8lro Gsiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=hv+gdtUtn1orYFuVl4hjEXvL2tQeuLYfG9JmNSBNrkM=; b=bTff8tbAQNFE68Ipcx41g875m/8daD0okwud0Ad8GdKbfyMasoIsIOCQwTFz7VP35Z dJfxvx/zmu7Zd1vtfPSufF3Mg3NcEYWSl6dI1caSJmX1OGbj+w52S4RScWDPo+d0TWfp 6hTWSLAO6xmMfjd12Tan00lo9+ouguVHJgvTYKsuVKI4/etyAXjCQcBHXtL0ydWqk/Ql qwACUvFm77/LtV2+CqMoPK1RWaeGIrEa9RHVHaERpwQjlOBoIMCLbeXUKg+u9R2i90JI T+WAZWt1/c1J8aLrmIamWlh/VBm66a4Pyt/j3gC9XpvHjfFH94rYJQ1nLK6mP51A6kJQ 6BzA== X-Gm-Message-State: APf1xPAusnqlA6sImql/zvljf1Q9o+lMuXDfvmWDj7n/YZsEHEA84pr3 J2Esqr/LywR8xLv9sWnpWi+rWo91qj/CSpnYV6U= X-Google-Smtp-Source: AG47ELv+sbNt1ZRWlmYNB8HxhN3EEZ0shXxAQQkoIUWhW0ncDxkycA61kH9Yq72FIb1LRsGalmsVJw0uCV1d3aH75ok= X-Received: by 10.36.41.84 with SMTP id p81mr13014529itp.105.1519740754781; Tue, 27 Feb 2018 06:12:34 -0800 (PST) MIME-Version: 1.0 References: <0B392DA6-09A6-4EDC-B1EE-4F50851BCBBE@apache.org> In-Reply-To: From: Dmitry Pavlov Date: Tue, 27 Feb 2018 14:12:23 +0000 Message-ID: Subject: Re: Stop nodes after test by default - IGNITE-6842 To: dev@ignite.apache.org Content-Type: multipart/alternative; boundary="001a113f679ad6f5e105663237ea" --001a113f679ad6f5e105663237ea Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, thank you! I've found several suspicious fails: such test fails have rate less than 1%, it is probably new failures. It would be great if we can fix it before merge. Could you address this fails? Sincerely, Dmitriy Pavlov IgniteCacheTestSuite5: IgniteCacheStoreCollectionTest.testStoreMap (fail rate 0,0%) IgniteCacheTestSuite5: CacheLateAffinityAssignmentTest.testDelayAssignmentClientJoin (fail rate 0,0%) IgniteCacheWithIndexingTestSuite: CacheRandomOperationsMultithreadedTest.testAtomicOffheapEviction (fail rate 0,0%) IgniteCacheWithIndexingTestSuite: CacheRandomOperationsMultithreadedTest.testAtomicOffheapEvictionIndexing (fail rate 0,0%) IgniteCacheWithIndexingTestSuite: CacheRandomOperationsMultithreadedTest.testTxOffheapEviction (fail rate 0,0%) IgniteCacheWithIndexingTestSuite: CacheRandomOperationsMultithreadedTest.testTxOffheapEvictionIndexing (fail rate 0,0%) IgniteBinarySimpleNameMapperCacheFullApiTestSuite: GridCachePartitionedNearDisabledMultiNodeWithGroupFullApiSelfTest.testWithS= kipStoreTx (fail rate 0,0%) =D0=B2=D1=82, 27 =D1=84=D0=B5=D0=B2=D1=80. 2018 =D0=B3. =D0=B2 17:04, Maxim= Muzafarov : > Yep, link may not be correct. > > Here is correct version: > TC: * > https://ci.ignite.apache.org/project.html?projectId=3DIgniteTests24Java8&= branch_IgniteTests24Java8=3Dpull%2F3542%2Fhead > < > https://ci.ignite.apache.org/project.html?projectId=3DIgniteTests24Java8&= branch_IgniteTests24Java8=3Dpull%2F3542%2Fhead > >* > > > =D0=B2=D1=82, 27 =D1=84=D0=B5=D0=B2=D1=80. 2018 =D0=B3. =D0=B2 16:41, Dmi= try Pavlov : > > > Hi Maxim, > > > > could you please provide link to TC run on your PR? It seems link > provided > > points to run of master. In changes field you may select pull/3542/head > > before starting RunAll. > > > > Igniters, > > > > this change is related to our test framework, so change may affect your > > tests. Please join to review > > https://reviews.ignite.apache.org/ignite/review/IGNT-CR-502 > > > > Sincerely, > > Dmitriy Pavlov > > > > =D0=B2=D1=82, 27 =D1=84=D0=B5=D0=B2=D1=80. 2018 =D0=B3. =D0=B2 16:14, M= axim Muzafarov : > > > > > Hi all, > > > > > > I think, I've done with this issue, what should we do next? > > > > > > PR: https://github.com/apache/ignite/pull/3542 > > > Upsource: https://reviews.ignite.apache.org/ignite/review/IGNT-CR-502 > > > TC: > > > > > > > > > https://ci.ignite.apache.org/viewModification.html?modId=3D723895&persona= l=3Dfalse&buildTypeId=3D&tab=3DvcsModificationTests > > > JIRA: https://issues.apache.org/jira/browse/IGNITE-6842 > > > > > > > > > =D1=87=D1=82, 22 =D1=84=D0=B5=D0=B2=D1=80. 2018 =D0=B3. =D0=B2 14:12,= Dmitry Pavlov : > > > > > > > Hi Maxim, > > > > > > > > Thank you. > > > > > > > > To my mind stopAllGrids call should be kept in afterTestsStop(). If > > > > developer forgot to call super(), he will see exception from > > > > beforeTestsStart() > > > > added by you. > > > > > > > > If you think patch is ready to be reviewed, please change JIRA stat= us > > to > > > > "Patch Available". > > > > > > > > Optionally you can create Upsource review. It would be easier for > > > multiple > > > > reviewers to handle this patch. This test framework is used by all > > > Igniters > > > > so Upsource would be good option (https://reviews.ignite.apache.org= / > > ). > > > > > > > > Sincerely, > > > > Dmitriy Pavlov > > > > > > > > =D1=87=D1=82, 22 =D1=84=D0=B5=D0=B2=D1=80. 2018 =D0=B3. =D0=B2 13:4= 4, Maxim Muzafarov : > > > > > > > > > Hi all, > > > > > > > > > > I've made some changes based on our previous discusstions, please > see > > > PR > > > > > [1]: > > > > > 1) Remove duplicated code for stopGrid (by index and by name); > > > > > 2) Add new method that thows exception if not all grids haven't > > stopped > > > > > correctly; > > > > > 3) Change tests that have been affected by this changes; > > > > > > > > > > Also, I have some thoughts for clarification: > > > > > 1) beforeTestsStart() - I expect here in subtests that grids are > not > > > > > started yet. Am I right? > > > > > 2) I think we should call stopAllGrids in tearDown method. So, if > in > > > some > > > > > cases we'll override afterTestsStop in subclasses and forgot to > call > > > > > super() it won't lead us to exception. > > > > > > > > > > [1] https://github.com/apache/ignite/pull/3542 > > > > > [2] > https://ci.ignite.apache.org/viewModification.html?modId=3D717275 > > > > > [3] https://issues.apache.org/jira/browse/IGNITE-6842 > > > > > > > > > > > > > > > =D1=81=D1=80, 7 =D1=84=D0=B5=D0=B2=D1=80. 2018 =D0=B3. =D0=B2 18:= 28, Maxim Muzafarov : > > > > > > > > > > > Thank you all, > > > > > > > > > > > > I'll add this comment's for JIRA ticket, if you don't mind. > > > > > > > > > > > > =D1=81=D1=80, 7 =D1=84=D0=B5=D0=B2=D1=80. 2018 =D0=B3. =D0=B2 1= 5:16, Dmitry Pavlov < > dpavlov.spb@gmail.com > > >: > > > > > > > > > > > >> Yes, this solution allows to cover both cases: > > > > > >> a) not stopped node from previous test and > > > > > >> b) allows to remove useless code that stops Ignite nodes from > each > > > > test. > > > > > >> > > > > > >> =D1=81=D1=80, 7 =D1=84=D0=B5=D0=B2=D1=80. 2018 =D0=B3. =D0=B2 = 15:13, Anton Vinogradov < > > > > avinogradov@gridgain.com > > > > > >: > > > > > >> > > > > > >> > Maxim, > > > > > >> > > > > > > >> > We discussed with Dima privately, and decided > > > > > >> > > > > > > >> > 1) We have to assert that there is no alive nodes at > > > > > GridAbstractTest's > > > > > >> > beforeTestsStarted > > > > > >> > 2) We have to kill all alive nodes (without force) at > > > > > GridAbstractTest's > > > > > >> > afterTestsStopped > > > > > >> > 3) In case of any exceptions at #2 we have to see test error > > > > > >> > 4) We can get rid of all useless stopAllGrids at > > > GridAbstractTest's > > > > > >> > subclasses. > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > On Wed, Feb 7, 2018 at 2:32 PM, Dmitry Pavlov < > > > > dpavlov.spb@gmail.com> > > > > > >> > wrote: > > > > > >> > > > > > > >> > > > Let's just add stopAllGrids(flase) to GridAbstractTest '= s > > > > > >> > > afterTestsStopped > > > > > >> > > method body. > > > > > >> > > > > > > > >> > > Can't agree with it becase implicit silent shutdown of nod= es > > > from > > > > > test > > > > > >> > > framework may cause a lot of bugs introduced to Ignite. > > > > > >> > > > > > > > >> > > I think stop+test fail should occur. > > > > > >> > > In that case author of incorrect test or not consistent > Ignite > > > > > >> shutdown > > > > > >> > > will see problem. > > > > > >> > > > > > > > >> > > 'Fail fast' if always better than hidding real problem und= er > > > > > automatic > > > > > >> > > framework feature. > > > > > >> > > > > > > > >> > > =D1=81=D1=80, 7 =D1=84=D0=B5=D0=B2=D1=80. 2018 =D0=B3. =D0= =B2 14:05, Anton Vinogradov < > > > > > >> avinogradov@gridgain.com > > > > > >> > >: > > > > > >> > > > > > > > >> > > > Hi all, > > > > > >> > > > > > > > > >> > > > > I've made some research about this problem and i think > > that > > > in > > > > > >> > general > > > > > >> > > we > > > > > >> > > > > should move stopAllGrids method in GridAbstractTest > class > > to > > > > > >> > > > > afterTestsStopped method with some changes. Am I right= ? > > > > > >> > > > Let's just add stopAllGrids(flase) to GridAbstractTest '= s > > > > > >> > > > afterTestsStopped method > > > > > >> > > > body. > > > > > >> > > > > > > > > >> > > > > I have a question about stopAllGrids(boolean cancel) > this > > > > > "cancel" > > > > > >> > > > That's just a flag means "do not wait for any operations > > > finish" > > > > > >> > > > See no reason to set it true in that case. > > > > > >> > > > > > > > > >> > > > > If you delegate closing to afterTestsStopped this will > > > affect > > > > > only > > > > > >> > > > > last test (method). > > > > > >> > > > The idea is to stop all nodes at last test's method > finish. > > > > > >> > > > > > > > > >> > > > > Nodes that survive between tests can affect successiv= e > > > > > >> > > > tests. > > > > > >> > > > Thats ok. we have a lot tests where nodes reusable betwe= en > > > > test's > > > > > >> > > methods. > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > On Wed, Feb 7, 2018 at 1:20 PM, Dmitry Pavlov < > > > > > >> dpavlov.spb@gmail.com> > > > > > >> > > > wrote: > > > > > >> > > > > > > > > >> > > > > Hi Igniters, > > > > > >> > > > > > > > > > >> > > > > IMO nodes that survive between tests is not general > > practice > > > > and > > > > > >> I'm > > > > > >> > > not > > > > > >> > > > > sure is a best practice. I suggest to mark such tests > with > > > > some > > > > > >> > method > > > > > >> > > > > overriden from AbstractTest. > > > > > >> > > > > > > > > > >> > > > > About cancel flag, please note stopAllGrids(boolean > > cancel) > > > > > >> > > cancel=3Dfalse > > > > > >> > > > > allows to wait of checkpoint ends in case persistence > > > enabled. > > > > > >> > > > > > > > > > >> > > > > I still suggest to avoid stopping any nodes by test, b= ut > > > > > validate > > > > > >> not > > > > > >> > > > > stopped node exist and fail test instead of siltent > > implicit > > > > > >> actions. > > > > > >> > > > > > > > > > >> > > > > Sincerely, > > > > > >> > > > > Dmitriy Pavlov > > > > > >> > > > > > > > > > >> > > > > =D1=81=D1=80, 7 =D1=84=D0=B5=D0=B2=D1=80. 2018 =D0=B3.= =D0=B2 13:04, Andrey Kuznetsov < > > > > > stkuzma@gmail.com > > > > > >> >: > > > > > >> > > > > > > > > > >> > > > > > Hi Maxim, > > > > > >> > > > > > > > > > > >> > > > > > Regarding your first question, the use of > > > afterTestsStopped > > > > is > > > > > >> not > > > > > >> > > > enough > > > > > >> > > > > > to stop all nodes, since each individual test (metho= d) > > can > > > > > start > > > > > >> > > custom > > > > > >> > > > > set > > > > > >> > > > > > of notes during its operation, and this very test > should > > > > stop > > > > > >> all > > > > > >> > > those > > > > > >> > > > > > nodes. If you delegate closing to afterTestsStopped > this > > > > will > > > > > >> > affect > > > > > >> > > > only > > > > > >> > > > > > last test (method). Nodes that survive between tests > can > > > > > affect > > > > > >> > > > > successive > > > > > >> > > > > > tests. > > > > > >> > > > > > > > > > > >> > > > > > 2018-02-07 1:10 GMT+03:00 Maxim Muzafarov < > > > > maxmuzaf@gmail.com > > > > > >: > > > > > >> > > > > > > > > > > >> > > > > > > Hello, > > > > > >> > > > > > > > > > > > >> > > > > > > I've made some research about this problem and i > think > > > > that > > > > > in > > > > > >> > > > general > > > > > >> > > > > we > > > > > >> > > > > > > should move stopAllGrids method in GridAbstractTes= t > > > class > > > > to > > > > > >> > > > > > > afterTestsStopped method with some changes. Am I > > right? > > > > > >> > > > > > > > > > > > >> > > > > > > Also, I have a question about stopAllGrids(boolean > > > cancel) > > > > > >> this > > > > > >> > > > > "cancel" > > > > > >> > > > > > > argument. Why in some cases we should interrupt > > > ComputeJob > > > > > >> and in > > > > > >> > > > some > > > > > >> > > > > > > cases shouldn't? For example here: > > > > > >> > > > > > > > IgniteBaselineAffinityTopologyActivationTest#afterTest > > > > > >> > > > > > > we call method stopAllGrids(false) this way. Why n= ot > > > > "true" > > > > > >> > > argument > > > > > >> > > > > > > instead? > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > -- > > > > > >> > > > > > Best regards, > > > > > >> > > > > > Andrey Kuznetsov. > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > --001a113f679ad6f5e105663237ea--