Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E91EF189C0 for ; Fri, 17 Jul 2015 16:11:05 +0000 (UTC) Received: (qmail 66794 invoked by uid 500); 17 Jul 2015 16:11:05 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 66741 invoked by uid 500); 17 Jul 2015 16:11:05 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 66645 invoked by uid 99); 17 Jul 2015 16:11:04 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jul 2015 16:11:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 31A7BC0710 for ; Fri, 17 Jul 2015 16:11:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.043 X-Spam-Level: ** X-Spam-Status: No, score=2.043 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-1.108, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id WUzs1M3Csetv for ; Fri, 17 Jul 2015 16:10:55 +0000 (UTC) Received: from mail-ig0-f171.google.com (mail-ig0-f171.google.com [209.85.213.171]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id F22F343CD3 for ; Fri, 17 Jul 2015 16:10:54 +0000 (UTC) Received: by igcqs7 with SMTP id qs7so40732647igc.0 for ; Fri, 17 Jul 2015 09:10:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=/Sj1AW/CV8z04eEfQ6WE4vDUi2tHlVRZ9oB1SIeVbW0=; b=tIlH5l7RKCZZdIsT2BAXbAyh8r+Ca8DbemGN0VNdapZBW+9kSpXeFPtgCfOP/cgxGo anTaYvDbea2LQY1Ia8FyWzdiHwW6V2m73QZVqe6noLZAmQ9CRvK8yudRC99CCc4T7BXD UqtmPSQhHHOv43ffWfBTevLsOGBFRXeJ/vUZDhxnA6R0CCxfWhAx4nW6cHYRfyyE8xPC 5a7JRTuJCb6i4hV6s1yzz/LM/sbqwluo3qHvEQYFsuJZrMg8vaKg6T3PmSiRs4fPq+2A R1EyMHjvP5jPSHdkbf38+Dc32w/s5v2bJNpPrgyeLfgRNreRB5CgFJWhueATS8BARAtn TKAA== X-Received: by 10.107.153.66 with SMTP id b63mr19841441ioe.101.1437149454591; Fri, 17 Jul 2015 09:10:54 -0700 (PDT) MIME-Version: 1.0 Received: by 10.36.61.7 with HTTP; Fri, 17 Jul 2015 09:10:15 -0700 (PDT) In-Reply-To: References: From: Sachin Goel Date: Fri, 17 Jul 2015 21:40:15 +0530 Message-ID: Subject: Re: Failing tests on Windows To: dev@flink.apache.org Content-Type: multipart/alternative; boundary=001a1140e6c0baee7f051b146ef8 --001a1140e6c0baee7f051b146ef8 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Since the failing tests on windows have come up again, I did find some failing tests when the community was testing the release candidates for 0.9.0 release. Here is one of the log outputs: http://pastebin.com/raw.php?i=3DVWbx2ppf These errors are on running mvn clean verify. Following were the failing tests: BlobUtilsTest.before:45 null BlobUtilsTest.before:45 null BlobServerDeleteTest.testDeleteFails:291 null BlobLibraryCacheManagerTest.testRegisterAndDownload:196 Could not remove write permissions from cache directory BlobServerPutTest.testPutBufferFails:224 null BlobServerPutTest.testPutNamedBufferFails:286 null JobManagerStartupTest.before:55 null JobManagerStartupTest.before:55 null DataSinkTaskTest.testFailingDataSinkTask:317 Temp output file has not been removed DataSinkTaskTest.testFailingSortingDataSinkTask:358 Temp output file has not been removed TaskManagerTest.testSubmitAndExecuteTask:123 assertion failed: timeout (19998080696 nanoseconds) during expectMsgClass waiting for class org.apache.flink.runtime.messages.RegistrationMessages$RegisterTaskMa= nager TaskManagerProcessReapingTest.testReapProcessOnFailure:133 TaskManager process did not launch the TaskManager properly. Failed to look up akka.tcp://flink@127.0.0.1:50673/user/taskmanager Most of these again seem related to file system permissions and time out errors. Please see if any changes you make fix these too. It is unlikely the final release had these fixed, because no fixes were explicitly filed for these. If you wish, file JIRAs for these too, in case these still persist. Further, since the build stops at flink-runtime, I can't be sure if any further tests wouldn't fail. I can try verify commands again 0nce there are fixes for these. Cheers! Sachin -- Sachin Goel Computer Science, IIT Delhi m. +91-9871457685 On Fri, Jul 17, 2015 at 9:34 PM, Stephan Ewen wrote: > Yes, please open JIRAs for that. > > If you want to provide some fixes, increasing the timeout in (4) is > probably reasonable. > > On Fri, Jul 17, 2015 at 5:53 PM, G=C3=A1bor G=C3=A9vay = wrote: > > > Hello! > > > > I tried to setup a development environment on Windows, but several > > tests are failing: > > > > 1. The setWritable problem. This will be worked around by [1] > > > > 2. The tryCleanupOnError before close problem [2]. This could be > > half-fixed by doing fixing 2. in the comment I wrote there, but I > > think that would still leave the problem open in the FileSinkFunction. > > Should I open a PR for this? > > > > 3. CsvOutputFormatITCase fails with about 30% chance with > > java.io.IOException: Unable to delete file: > > > > > C:\Users\Gabor\AppData\Local\Temp\org.apache.flink.streaming.api.outputfo= rmat.CsvOutputFormatITCase-result\1 > > at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2279) > > at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653) > > at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535) > > at > > > org.apache.flink.test.util.TestBaseUtils.deleteRecursively(TestBaseUtils.= java:508) > > at > > > org.apache.flink.test.util.AbstractTestBase.deleteAllTempFiles(AbstractTe= stBase.java:141) > > at > > > org.apache.flink.test.util.AbstractTestBase.stopCluster(AbstractTestBase.= java:69) > > at > > > org.apache.flink.streaming.util.StreamingProgramTestBase.testJobWithoutOb= jectReuse(StreamingProgramTestBase.java:118) > > <23 internal calls> > > > > I guess this is also some file closing issue. > > > > > > Additionally, there are some more mysterious failures which are > > happening only from Maven, and I can't reproduce them when running a > > test from the IDE: > > > > 4. testFindConnectableAddress(org.apache.flink.runtime.net.NetUtilsTest= ) > > Time elapsed: 20.936 sec <<< FAILURE! > > java.lang.AssertionError: null > > at org.junit.Assert.fail(Assert.java:86) > > at org.junit.Assert.assertTrue(Assert.java:41) > > at org.junit.Assert.assertTrue(Assert.java:52) > > at > > > org.apache.flink.runtime.net.NetUtilsTest.testFindConnectableAddress(NetU= tilsTest.java:54) > > > > It is interesting that it is not happening from the IDE, but I think > > this is just because it gets less CPU time when some other tests are > > running in parallel from Maven. It takes 2-4 s from the IDE under > > Windows, but it takes consistently very close to 2 s under Linux. > > Maybe the 8 sec timeout could be raised under Windows? (Or what do you > > think about trying to connect from the multiple interfaces in > > parallel? That is, parallelizing the outer loop in > > findAddressUsingStrategy.) > > > > 5. testGroupByFeedback(org.apache.flink.streaming.api.IterateTest) > > Time elapsed: 12.091 sec <<< ERROR! > > org.apache.flink.runtime.client.JobExecutionException: Job execution > > failed. > > at > > > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMes= sages$1.applyOrElse(JobManager.scala:314) > > at > > > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractParti= alFunction.scala:33) > > at > > > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFuncti= on.scala:33) > > at > > > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFuncti= on.scala:25) > > at > > > org.apache.flink.runtime.testingUtils.TestingJobManager$$anonfun$receiveT= estingMessages$1.applyOrElse(TestingJobManager.scala:169) > > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162= ) > > at > > > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.= scala:36) > > at > > > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.= scala:29) > > at > > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > > at > > > org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMes= sages.scala:29) > > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > > at > > > org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.s= cala:93) > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > > at akka.actor.ActorCell.invoke(ActorCell.scala:487) > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > > at > > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > at > > > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.jav= a:1339) > > at > > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979= ) > > at > > > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.j= ava:107) > > Caused by: java.lang.AssertionError: null > > at org.junit.Assert.fail(Assert.java:86) > > at org.junit.Assert.assertTrue(Assert.java:41) > > at org.junit.Assert.assertTrue(Assert.java:52) > > at > > org.apache.flink.streaming.api.IterateTest$6.close(IterateTest.java:447= ) > > at > > > org.apache.flink.api.common.functions.util.FunctionUtils.closeFunction(Fu= nctionUtils.java:40) > > at > > > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.close(= AbstractUdfStreamOperator.java:75) > > at > > > org.apache.flink.streaming.runtime.tasks.StreamTask.closeOperator(StreamT= ask.java:182) > > at > > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.invoke(OneInp= utStreamTask.java:112) > > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:577) > > at java.lang.Thread.run(Thread.java:745) > > > > I have no idea what goes wrong here. > > > > 6. > > > complexIntegrationTest1(org.apache.flink.streaming.api.complex.ComplexInt= egrationTest) > > Time elapsed: 15.989 sec <<< FAILURE! > > java.lang.AssertionError: Different number of lines in expected and > > obtained result. expected:<9> but was:<5> > > at org.junit.Assert.fail(Assert.java:88) > > at org.junit.Assert.failNotEquals(Assert.java:743) > > at org.junit.Assert.assertEquals(Assert.java:118) > > at org.junit.Assert.assertEquals(Assert.java:555) > > at > > > org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(Te= stBaseUtils.java:272) > > at > > > org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(Te= stBaseUtils.java:258) > > at > > > org.apache.flink.streaming.api.complex.ComplexIntegrationTest.after(Compl= exIntegrationTest.java:91) > > > > This is only happening with a chance of about 30%. There is one thing > > in the code of this test which is a little suspicios to me: all tests > > are using the same 'resultPath' and 'expected' variables. Can it not > > happen that Maven runs these tests in the same jvm, and thus they step > > on each others feet? > > > > > > Should I open jiras for the last four problems? > > > > Best regards, > > Gabor > > > > [1] https://github.com/apache/flink/pull/919 > > [2] https://issues.apache.org/jira/browse/FLINK-2369 > > > --001a1140e6c0baee7f051b146ef8--