From jira-return-11134-archive-asf-public=cust-asf.ponee.io@kafka.apache.org Sat Mar 24 16:40:06 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 820EF18064F for ; Sat, 24 Mar 2018 16:40:05 +0100 (CET) Received: (qmail 26026 invoked by uid 500); 24 Mar 2018 15:40:04 -0000 Mailing-List: contact jira-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@kafka.apache.org Delivered-To: mailing list jira@kafka.apache.org Received: (qmail 26015 invoked by uid 99); 24 Mar 2018 15:40:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 24 Mar 2018 15:40:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 1F2BFC1363 for ; Sat, 24 Mar 2018 15:40:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.511 X-Spam-Level: X-Spam-Status: No, score=-109.511 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id ime54MyiPpz9 for ; Sat, 24 Mar 2018 15:40:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 5867C5F126 for ; Sat, 24 Mar 2018 15:40:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 7D850E00C4 for ; Sat, 24 Mar 2018 15:40:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 23CED214EE for ; Sat, 24 Mar 2018 15:40:00 +0000 (UTC) Date: Sat, 24 Mar 2018 15:40:00 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: jira@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (KAFKA-6710) Streams integration tests hang during shutdown MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/KAFKA-6710?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1641= 2663#comment-16412663 ]=20 ASF GitHub Bot commented on KAFKA-6710: --------------------------------------- rajinisivaram opened a new pull request #4771: KAFKA-6710: Remove Thread.sl= eep from LogManager.deleteLogs URL: https://github.com/apache/kafka/pull/4771 =20 =20 `Thread.sleep` in `LogManager.deleteLogs` potentially blocks a scheduler= thread for up to `log.segment.delete.delay.ms` with a default value of a m= inute. This PR skips delete when the first log is not yet ready to be delet= ed, freeing the scheduler thread. Logs are then deleted on the next delete = iteration. =20 ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation=20 - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) =20 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. =20 For queries about this service, please contact Infrastructure at: users@infra.apache.org > Streams integration tests hang during shutdown > ---------------------------------------------- > > Key: KAFKA-6710 > URL: https://issues.apache.org/jira/browse/KAFKA-6710 > Project: Kafka > Issue Type: Bug > Components: core, streams > Affects Versions: 1.1.0 > Reporter: Rajini Sivaram > Assignee: Rajini Sivaram > Priority: Major > > Builds have been timing out a lot recently and many of the logs show stre= ams integration tests being run, but not completed. While running tests loc= ally, I saw a failure during shutdown of {{TableTableJoinIntegrationTest}}.= The test was stuck waiting for a broker to shutdown when a {{KafkaSchedule= r}} was attemping to delete logs. KAFKA-6624 (Commit #1ea07b993d75ed68f4c04= 282eb177bf84156e0b2) added a _Thread.sleep_ to wait for the time to delete = each log segment inside the scheduled delete task. The failing streams test= had 62 logs to delete and since MockTime doesn't get updated during the te= st, it would have waited for 62 minutes to complete. This blocks shutdown o= f the broker for 62 minutes. This is an issue if a streams integration test= takes more than 30 seconds when the first delayed delete task is scheduled= to be run. > Changing _Thread.sleep_ to _time.sleep_ fixes this test issue. But it wil= l be good to know why we have a _sleep_ on a _Scheduler_ at all. With the d= efault _log.segment.delete.delay.ms_ of one minute, this potentially blocks= a scheduler thread for upto a minute when there are logs to be deleted. Co= uldn't we just break out of the loop if it is not yet time to delete the fi= rst log segment in the list? The log would then get deleted when the broker= checks next time. [~junrao] [~lindong] ? > =C2=A0 > *Stack trace from failing test*: > {{"kafka-scheduler-8" daemon prio=3D5 tid=3D0x00007fe58dc16800 nid=3D0x96= 03 waiting on condition [0x0000700003f25000]}} > {{=C2=A0=C2=A0 java.lang.Thread.State: TIMED_WAITING (sleeping)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.lang.Thread.sleep(Na= tive Method)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at kafka.log.LogManager.kafk= a$log$LogManager$$deleteLogs(LogManager.scala:717)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at kafka.log.LogManager$$ano= nfun$3.apply$mcV$sp(LogManager.scala:406)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at kafka.utils.KafkaSchedule= r$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at kafka.utils.CoreUtils$$an= on$1.run(CoreUtils.scala:62)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.Exec= utors$RunnableAdapter.call(Executors.java:471)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.Futu= reTask.run(FutureTask.java:262)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.Sche= duledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolE= xecutor.java:178)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.Sche= duledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor= .java:292)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.Thre= adPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.Thre= adPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.lang.Thread.run(Thre= ad.java:745)}}{{}} > {{}}{{"Test worker" prio=3D5 tid=3D0x00007fe58db72000 nid=3D0x5203 waitin= g on condition [0x0000700001cbd000]}} > {{=C2=A0=C2=A0 java.lang.Thread.State: TIMED_WAITING (parking)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at sun.misc.Unsafe.park(Nati= ve Method)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 - parking to wait for=C2=A0 = <0x0000000780fb8918> (a java.util.concurrent.locks.AbstractQueuedSynchroniz= er$ConditionObject)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.lock= s.LockSupport.parkNanos(LockSupport.java:226)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.lock= s.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynch= ronizer.java:2082)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.Thre= adPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1468)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at kafka.utils.KafkaSchedule= r.shutdown(KafkaScheduler.scala:98)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at kafka.server.KafkaServer$= $anonfun$shutdown$5.apply$mcV$sp(KafkaServer.scala:569)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at kafka.utils.CoreUtils$.sw= allow(CoreUtils.scala:85)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at kafka.server.KafkaServer.= shutdown(KafkaServer.scala:569)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.kafka.streams.= integration.utils.KafkaEmbedded.stop(KafkaEmbedded.java:129)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.kafka.streams.= integration.utils.EmbeddedKafkaCluster.stop(EmbeddedKafkaCluster.java:126)}= } > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.kafka.streams.= integration.utils.EmbeddedKafkaCluster.after(EmbeddedKafkaCluster.java:158)= }} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.junit.rules.ExternalR= esource$1.evaluate(ExternalResource.java:50)}} > {{=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.junit.rules.RunRules.= evaluate(RunRules.java:20)}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)