From dev-return-71634-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org Thu Jul 19 22:24:26 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id C250C180630 for ; Thu, 19 Jul 2018 22:24:25 +0200 (CEST) Received: (qmail 31513 invoked by uid 500); 19 Jul 2018 20:24:24 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 31498 invoked by uid 99); 19 Jul 2018 20:24:24 -0000 Received: from mail-relay.apache.org (HELO mailrelay1-lw-us.apache.org) (207.244.88.152) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Jul 2018 20:24:24 +0000 Received: from [192.168.1.233] (BC2463D2.dsl.pool.telekom.hu [188.36.99.210]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 29CF2CFE; Thu, 19 Jul 2018 20:24:22 +0000 (UTC) To: dev@zookeeper.apache.org, Patrick Hunt References: From: =?UTF-8?Q?Andor_Moln=c3=a1r?= Openpgp: preference=signencrypt Autocrypt: addr=andor@apache.org; prefer-encrypt=mutual; keydata= xsFNBFspV7sBEADE8uw+howAopRrHVsNo7pYlKbP203uJvekgXrW1Y2LGItRLmpzF2jqh2Ra kmtkX5DvX9eUEepZsZYFhM5RmmniJJr5sVtP19SYIXnwuPoV9CAb6Txbf7vLa4wZo+AXNr3A ipaQuIT+tH75sjtCxjQoCrQ1c+Y4VjYjckhAr7FNUADjUCkMT7/rTWHc1kIbfazU581VheE6 uss0+hgSHsrcr0irANnswZgnW4LRvkvg0VbFE9lzoM1FfnycNbo9mw55lNQzz4FCTeewS3im Gia9dHA9Z67VlpUrjPn+I5aet4m82sfROWyOr4/fAsfpIzhNy9O55Dnao4d9k7sYNy0eTIkT aDGF5p6kJ+YL6udxXy7ClhkjY5xtz4b3dclFFzh9PyxuqbkbvU1DPenVPfY7/Qfo9rGobyPB zYDrDxkXbUOIDHB1cZ7Cx/t3HdlfHWqqmuLiV3msg/VGteLQfFep2MpbROSPGDmCXvFwVx4U 4xmcHWy418BKnePGDKgEQsVy3SjarK316MmjwFVcYFG7pdV+ErDUtS37DwvZ3gsGKpvGf41o qMSrOw/XbBSz0mFpLXaj4NfU6T6Am6w3B/rQ1084UycSo2ZneFjgQM4z7hEkwfoVYTUspf/U rZxfb8PELk1OoPnFfvMjp2In5sCpeQ9qc5vXrP3jztT8A3gQIwARAQABzSBBbmRvciBNb2xu w6FyIDxhbmRvckBhcGFjaGUub3JnPsLBjgQTAQoAOBYhBD96HRb6Qhex3HXhyf/jW38V36G6 BQJbKVe7AhsDBQsJCAcCBhUKCQgLAgQWAgMBAh4BAheAAAoJEP/jW38V36G6TXkP/0FLBH4r u9bykZZml9IakrXqhgaI5lwNQ22J2WQ2UBZsKCOvxDQVtULD/RAL0AbpKrUIsKZZff8T7dA2 jrsOIUZV7C2vciRcrMOPM9PBj03+57VsTomrYo2iJn3v8g17FflUdUw1RGMXW4UlJTbcAWKT T1COQjpuR8ul1qwpIyKTzzGV2a2HcSnGLEYgsPedQR+tu1Lv6mI8WLPBbzpMdpvY0FoVLJIj jZFBNggBfaDoc88Cp4sDJviUzWblU4UCrz9nO6jp1D319EIzZHVR3+2nBOGUP0LKML1LSIKE RoSy/WVDithjR7vJOxKbGUXLdLJCE27SRwqrTWWhWZZiPDS81RNMaBAs52TI7AhAiy0fAHRA F3cPdkT1khplHrirUZ/8sKkE9kRmo2p0wqDFxPlvLdFW0Ca6zcrgdc8jI3VHyvscRcw/d5Uw s6+p8gao1/mujb+WOhDKYtauLKzYG1VefEvpdO2sQnFbXfEjI+SXZB7LKQCw71Zs9de3TBiT /5iG0UNIMfT8YploXXlwmjIlVySyRXCAs5HRa039XVpwOpaW3sH1diTQsuzLW3/qyenAt2sl rXRTHFbCzf/YziruqhH6I8E0qqRC8FS2h593OO4sKyADx193zcg8lxv/a8gNE4OaoXT+jlUU nPFXza9OOsNXVLXt5KHWvODfHnYozsFNBFspV7sBEACeBluwP+o6DkSWlZJcxYTVP/euuLKg RQNLeY+cS05+qcijK+bBAHR5BSBX64inYiKSga0GGQHJSOTC6O3KP517olHdx4vukwv/LT4a CugT9ZQ2HeZRUt6l8Z6Aacu4nEZBg6EFv4Zlq35TPY4Mds6h/11mLyrZChdRo1ZjVV5QaGWz JwU3qnJ/g0346rgu/ceUfm8Eb+USv0geVW85DzDHjvDNciyWmUUqBF1btlv45ex53M6ZvmVk GERniDber58LMjMDCZ6Aj7S4CgPIzoGUwaJoGsPm3/8QFsLPkj3oHdVmPMhRK63a8iCvXMH0 cOn640F8rkw13owaBI0eoR/rnSNVDPw01rjwxQzpL+a4bIxSix056Lx7yuOppv87QpPc19Ic Ua7haAGj4fs9BpFAJR7EA030lPV2MKXbldT34C+f25bZtO1Sj0fw3T+1YcBKSS3xpU4hyf7I 6G5ISh8kfoO8OfUKU/6PpeAPy49dl8KJNLDoYnXJ+BFcq9Cx0epGUoHaNyyv8348SdJmixAN xeyETErKsFu1PaomZEg/fOBtO0JrZY0kpwllNm7wmXdkSurovcqhrQ76BbMdyyikpfbuFJB0 m4+dKxTNBDw89fLDYy61NDYiF6ytGkTAXHAAcEvx6k8eDtjtzQFqTxstSPOP6XMtnjpc4pId xgYSzwARAQABwsF2BBgBCgAgFiEEP3odFvpCF7HcdeHJ/+NbfxXfoboFAlspV7sCGwwACgkQ /+NbfxXfobooPg/9EFm5tsoATNTWwyPRUG5uPrDNJHrlw0o34iXeA7Er/Gdso5PBJuzAzTi/ n2H1AacHgnzOLcmVtkDakHq0WBBKy0E8n/0vedIECvoyN3HNkQ0nzuBPOEJwdHDr03x6BX+R rv+f3oR3YzOyW86Sw7Qlt8dGGACn7TQVOg0bSR03NDAb7vgvXSL/Xo2rySus/61QiBXKw9nS bSh4MBnMYRn57nHZ/DbFBYbvmiSQZO4fuQOew+IS8x/Bg3m5RHKLGw7MFNJn/KQG4YvNRlqa 3RhR4ReCjIq1EOSgx2en0dKRao/XfPgzZo2iEpLwBNxVcC3LyUTsGD6PUhyLJ06GuLBfLbxL ++iJEiUMZzpiKhkV9tl7agXMNKxSjDr6g93slkv6TWQ3oiFMgYVxbIUN9Do6WOwpZeOnjlbW 7+zNVhLc2sAnohc8Mr+00N/KenN/Br2g00zUI9yU34ZygP/NO2t7T+MGYr3T26ZcCodtWAam tT41bhhLuDw3euO+ZmaMaloCvtTvayMyd8EX9cJnZQ3iZdXmDgI+bYZk6J9eaDc2p7cjVMjd /N9r1XNVhm6+XRUm6hZQBORYm4H90v46eo+QgJhlcWPguqrjVAcxPgxH71LXCYaeLfT7CbxQ dcHpRcupIjmiAEN2DnUNhl3imlMjQxjWvW4f3mzYECHNs96iZgU= Subject: Re: Trying to find pattern in Flaky Tests Message-ID: Date: Thu, 19 Jul 2018 22:24:21 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------0F1D509E08DBB47505D59DD7" Content-Language: en-US --------------0F1D509E08DBB47505D59DD7 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Thanks Pat for fixing the report. Very useful. I think it was unable to retrieve the test report, because there was no test report available for those builds at the time it was running. We might want to expose the error code in the below logs, but if you take a look at #104 and #100: test report is still not available. Build #104 seems like interrupted during tests, #100 was experiencing some timeout and didn't publish test report either. Maybe #108 was still running when flaky report tried to grab the report. Latest run had the same issue with this build: https://builds.apache.org/job/ZooKeeper_branch35_jdk8/1044/console "No space left on device" - definitely a problem :) Don't need to do anything about this. Skipping these builds is just fine.= Regards, Andor On 07/18/2018 08:16 PM, Patrick Hunt wrote: > Ok, I committed a change that seems to address the main failure: > https://github.com/apache/zookeeper/commit/06b9507ab78a1a055b8f467846c1= 5791600b72ee > > https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-Find-Fl= aky-Tests/lastSuccessfulBuild/artifact/report.html > > However I do notice some oddness in the sense that for some jobs/runs i= t > fails to get the information from the REST interface, even though it's = fine > for most of them, take a look, any ideas? > https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-Find-Fl= aky-Tests/456/console > > [ZooKeeper-Find-Flaky-Tests] $ /bin/bash /tmp/jenkins445277365379003173= 0.sh > ERROR:__main__:failed to get: > https://builds.apache.org/job/ZooKeeper-trunk/108/testReport/api/json?t= ree=3Dsuites%5Bname%2Ccases%5BclassName%2Cname%2Cstatus%5D%5D > ERROR:__main__:failed to get: > https://builds.apache.org/job/ZooKeeper-trunk/104/testReport/api/json?t= ree=3Dsuites%5Bname%2Ccases%5BclassName%2Cname%2Cstatus%5D%5D > ERROR:__main__:failed to get: > https://builds.apache.org/job/ZooKeeper-trunk/100/testReport/api/json?t= ree=3Dsuites%5Bname%2Ccases%5BclassName%2Cname%2Cstatus%5D%5D > > > Notice that it doesn't complain about job 107 (etc...) > > Any ideas on this? Have you seen this before? Perhaps we should open an= > INFRA jira? > > Patrick > > On Wed, Jul 18, 2018 at 10:52 AM Patrick Hunt wrote:= > >> FYI, created this: >> https://issues.apache.org/jira/browse/INFRA-16785 >> for the security warnings, not sure if that's causing the issue. Likel= y >> it's the recent jenkins upgrade, looking into it a bit... >> >> Patrick >> >> >> On Wed, Jul 18, 2018 at 9:48 AM Michael Han wrote: >> >>> Hi Andor, >>> >>>>> I suspect it should succeed eventually if we were to increase the >>> timeout even more. But is that correct? Bug or infrastructure issue? >>> >>> You could set up a dedicated git branch with all patches (e.g. the on= e in >>> ZOOKEEPER-2251) you want to apply and I can set up a dedicated Jenkin= s job >>> that points to this branch and stress test the entire unit test suite= =2E >>> Some >>> tests are only flaky when they ran on Apache infrastructure and when = they >>> ran together. >>> >>> It would be interesting to figure out what cause this test fail. Sinc= e >>> same >>> test works reliably in 3.4, there must be some commits in 3.5 that we= >>> could >>> possibly blame... >>> >>>>> I'm going to raise a ticket on that if somebody willing to fix it. >>> I just had a brief look before Jenkins is down. Looks like python was= >>> complaining about some SSL stuff and I suspect if we upgrade to use l= ater >>> version of python (3.x) it might work. I'll try that later when Jenki= ns is >>> back. >>> >>> >>> On Wed, Jul 18, 2018 at 8:42 AM, Andor Molnar >> wrote: >>> >>>> Hi, >>>> >>>> *branch-3.4* >>>> >>>> I've taken a quick look at our Jenkins builds and in terms of flaky >>> tests, >>>> it looks like branch-3.4 is in a pretty good shape. The build hasn't= >>> failed >>>> for 5-6 days on all JDKs which I think is pretty awesome. >>>> >>>> *branch-3.5* >>>> >>>> This branch is in very bad condition. Which is quite unfortunate giv= en >>>> we're in the middle of stabilising it. :) >>>> Especially on JDK8, last successful build was 11 days ago. JDK9 (50%= >>>> failing) and JDK10 (30% failing) are looking better in the last 10 >>> builds. >>>> Interestingly (apart from a few quite rare ones) it looks there's on= ly 1 >>>> test which is quite nasty on this branch: testManyChildWatchersAutoR= eset >>>> >>>> There's a Jira about fixing it and a fix has been merged by increasi= ng >>> the >>>> timeout of the test, but having a bug on the branch is also possible= >>>> causing the test to fail even with 10 min timeout. >>>> >>>> I wasn't able to repro the failing test on my machine (Mac and >>> CentOS7), it >>>> always finished in 30-40 seconds maximum. On jenkins slaves it shows= the >>>> following: >>>> >>>> *JDK 8:* >>>> >>>> Report creation timed out. >>>> >>>> >>>> *JDK 9:* >>>> >>>> New Failures >>>> Chart >>>> See children >>>> Build Number =E2=87=92 >>>> Package-Class-Testmethod names =E2=87=93 >>>> 351 >>>> 350 >>>> 349 >>>> 348 >>>> 347 >>>> 346 >>>> 345 >>>> 344 >>>> 343 >>>> 342 >>>> 341 >>>> 340 >>>> 339 >>>> 338 >>>> 337 >>>> 336 >>>> 335 >>>> 334 >>>> testManyChildWatchersAutoReset >>>> 45.604 >>>> >>> ZooKeeper_branch35_java9/351/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 600.337 >>>> >>> ZooKeeper_branch35_java9/350/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 21.904 >>>> >>> ZooKeeper_branch35_java9/349/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 583.063 >>>> >>> ZooKeeper_branch35_java9/348/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 600.325 >>>> >>> ZooKeeper_branch35_java9/347/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 600.383 >>>> >>> ZooKeeper_branch35_java9/346/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 600.362 >>>> >>> ZooKeeper_branch35_java9/345/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 21.139 >>>> >>> ZooKeeper_branch35_java9/344/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 24.031 >>>> >>> ZooKeeper_branch35_java9/343/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 584.200 >>>> >>> ZooKeeper_branch35_java9/342/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 600.327 >>>> >>> ZooKeeper_branch35_java9/341/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 600.323 >>>> >>> ZooKeeper_branch35_java9/340/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 23.737 >>>> >>> ZooKeeper_branch35_java9/339/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 600.406 >>>> >>> ZooKeeper_branch35_java9/338/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 547.004 >>>> >>> ZooKeeper_branch35_java9/337/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 600.393 >>>> >>> ZooKeeper_branch35_java9/336/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> N/A >>>> >>> ZooKeeper_branch35_java9/test_results_analyzer/> >>>> 373.955 >>>> >>> ZooKeeper_branch35_java9/334/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> >>>> >>>> *JDK 10:* >>>> >>>> >>>> New Failures >>>> Chart >>>> See children >>>> Build Number =E2=87=92 >>>> Package-Class-Testmethod names =E2=87=93 >>>> 110 >>>> 109 >>>> 108 >>>> 107 >>>> 106 >>>> 105 >>>> 104 >>>> 103 >>>> 102 >>>> 101 >>>> 100 >>>> 99 >>>> 98 >>>> 97 >>>> 96 >>>> 95 >>>> 94 >>>> 93 >>>> 92 >>>> testManyChildWatchersAutoReset >>>> 364.945 >>>> >>> ZooKeeper_branch35_java10/110/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 543.983 >>>> >>> ZooKeeper_branch35_java10/109/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 388.182 >>>> >>> ZooKeeper_branch35_java10/108/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 600.446 >>>> >>> ZooKeeper_branch35_java10/107/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 600.025 >>>> >>> ZooKeeper_branch35_java10/106/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 535.046 >>>> >>> ZooKeeper_branch35_java10/105/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 600.306 >>>> >>> ZooKeeper_branch35_java10/104/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 474.005 >>>> >>> ZooKeeper_branch35_java10/103/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 560.925 >>>> >>> ZooKeeper_branch35_java10/102/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 600.328 >>>> >>> ZooKeeper_branch35_java10/101/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 558.547 >>>> >>> ZooKeeper_branch35_java10/100/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 600.397 >>>> >>> ZooKeeper_branch35_java10/99/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 600.414 >>>> >>> ZooKeeper_branch35_java10/98/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 430.383 >>>> >>> ZooKeeper_branch35_java10/97/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 564.064 >>>> >>> ZooKeeper_branch35_java10/96/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 600.357 >>>> >>> ZooKeeper_branch35_java10/95/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 432.435 >>>> >>> ZooKeeper_branch35_java10/94/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 596.378 >>>> >>> ZooKeeper_branch35_java10/93/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> 39.242 >>>> >>> ZooKeeper_branch35_java10/92/testReport/org.apache.zookeeper.test/ >>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset> >>>> >>>> >>>> It takes ages to complete on Jenkins for some reason and it looks li= ke >>> it >>>> ends quite frequently close to the limit, so I suspect it should suc= ceed >>>> eventually if we were to increase the timeout even more. But is that= >>>> correct? >>>> Bug or infrastructure issue? >>>> >>>> *master / 3.6* >>>> >>>> Pretty much the same as 3.5. I haven't seen >>> testManyChildWatchersAutoReset >>>> failing on this branch with JDK8 which is a bit confusing, but other= >>> then >>>> that I see the same pattern on JDK9 and JDK10. Unable to generate th= e >>> above >>>> reports here, because Test Result Analyzer keep timeouting for me, b= ut >>> I'll >>>> follow-up when I have them. >>>> >>>> Btw. Flaky Test report has been broken for 10 days, I'm going to rai= se a >>>> ticket on that if somebody willing to fix it. (I'm planning to do so= =2E) >>>> It would be nice to see the report working again, because if my >>>> observations are correct, we don't have too many annoying tests apar= t >>> from >>>> the one mentioned. >>>> >>>> Thanks, >>>> Andor >>>> --------------0F1D509E08DBB47505D59DD7--