Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C02B1200CBD for ; Thu, 1 Jun 2017 05:52:10 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id BEB3F160BCB; Thu, 1 Jun 2017 03:52:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DF4EB160BDB for ; Thu, 1 Jun 2017 05:52:09 +0200 (CEST) Received: (qmail 54016 invoked by uid 500); 1 Jun 2017 03:52:08 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 53999 invoked by uid 99); 1 Jun 2017 03:52:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Jun 2017 03:52:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 03B6A1AFAE4 for ; Thu, 1 Jun 2017 03:52:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id dKmMDKgUKXWV for ; Thu, 1 Jun 2017 03:52:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id EE0755FBB8 for ; Thu, 1 Jun 2017 03:52:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 746A7E01D9 for ; Thu, 1 Jun 2017 03:52:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id A9D1B21B5F for ; Thu, 1 Jun 2017 03:52:04 +0000 (UTC) Date: Thu, 1 Jun 2017 03:52:04 +0000 (UTC) From: "Simon Zhou (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-13555) Thread leak during repair MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 01 Jun 2017 03:52:10 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-13555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032401#comment-16032401 ] Simon Zhou commented on CASSANDRA-13555: ---------------------------------------- Thanks [~tjake] for the comment. I'll be working on the patch but I'm not sure whether that is the best fix. Reasons: 1. The "executor" is created in RepairRunnable and runs all RepairJob's for a given keyspace. It's not a single RepairSession instance's responsibility to stop the "executor", nor it has a reference to it. 2. The bigger problem is that, why do we handle "node down" in RepairSession? IMHO it should be handled at a higher level. That means, once an endpoint is down, we should stop all RepairRunnable's. Sure there could be improvement, e.g., only stop those affected RepairSession's (token ranges). But anyway we are not doing this today and it deserves a separate change. What do you think? I know there is bigger change in upcoming 4.0 but I don't want a band-aid fix that just makes things messy. > Thread leak during repair > ------------------------- > > Key: CASSANDRA-13555 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13555 > Project: Cassandra > Issue Type: Bug > Reporter: Simon Zhou > Assignee: Simon Zhou > > The symptom is similar to what happened in [CASSANDRA-13204 | https://issues.apache.org/jira/browse/CASSANDRA-13204] that the thread waiting forever doing nothing. This one happened during "nodetool repair -pr -seq -j 1" in production but I can easily simulate the problem with just "nodetool repair" in dev environment (CCM). I'm trying to explain what happened with 3.0.13 code base. > 1. One node is down while doing repair. This is the error I saw in production: > {code} > ERROR [GossipTasks:1] 2017-05-19 15:00:10,545 RepairSession.java:334 - [repair #bc9a3cd1-3ca3-11e7-a44a-e30923ac9336] session completed with the following error > java.io.IOException: Endpoint /10.185.43.15 died > at org.apache.cassandra.repair.RepairSession.convict(RepairSession.java:333) ~[apache-cassandra-3.0.11.jar:3.0.11] > at org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:306) [apache-cassandra-3.0.11.jar:3.0.11] > at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:766) [apache-cassandra-3.0.11.jar:3.0.11] > at org.apache.cassandra.gms.Gossiper.access$800(Gossiper.java:66) [apache-cassandra-3.0.11.jar:3.0.11] > at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:181) [apache-cassandra-3.0.11.jar:3.0.11] > at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) [apache-cassandra-3.0.11.jar:3.0.11] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_121] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_121] > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_121] > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_121] > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_121] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_121] > at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [apache-cassandra-3.0.11.jar:3.0.11] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121] > {code} > 2. At this moment the repair coordinator hasn't received the response (MerkleTrees) for the node that was marked down. This means, RepairJob#run will never return because it waits for validations to finish: > {code} > // Wait for validation to complete > Futures.getUnchecked(validations); > {code} > Be noted that all RepairJob's (as Runnable) run on a shared executor created in RepairRunnable#runMayThrow, while all snapshot, validation and sync'ing happen on a per-RepairSession "taskExecutor". The RepairJob#run will only return when it receives MerkleTrees (or null) from all endpoints for a given column family and token range. > As evidence of the thread leak, below is from the thread dump. I can also get the same stack trace when simulating the same issue in dev environment. > {code} > "Repair#129:56" #406373 daemon prio=5 os_prio=0 tid=0x00007fc495028400 nid=0x1a77d waiting on condition [0x00007fc021530000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000002d7c00198> (a com.google.common.util.concurrent.AbstractFuture$Sync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285) > at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137) > at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1509) > at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:160) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > at org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$4/725832346.run(Unknown Source) > at java.lang.Thread.run(Thread.java:745) > Locked ownable synchronizers: > - <0x00000002d7c00230> (a java.util.concurrent.ThreadPoolExecutor$Worker) > {code} > So here are two things: > 1. For the thread leak itself, either we do something like below in RepairSession#terminate, or we use timed wait at the end of RepairJob#run. > {code} > for (ValidationTask validationTask : validating.values()) { > validationTask.treesReceived(null); > } > validating.clear(); > {code} > 2. Another question is, instead of waiting for synchronization (SyncTask) to finish, why we just wait for validation? Is it because we want to speed things up and anyway we have throttling on streaming? > [~yukim] I'd love to get your comment. I'll check if this issue exists in other versions. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org For additional commands, e-mail: commits-help@cassandra.apache.org