Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B26A8200C16 for ; Thu, 9 Feb 2017 11:29:18 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id B0FF1160B64; Thu, 9 Feb 2017 10:29:18 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D357F160B4C for ; Thu, 9 Feb 2017 11:29:17 +0100 (CET) Received: (qmail 96489 invoked by uid 500); 9 Feb 2017 10:29:16 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 96474 invoked by uid 99); 9 Feb 2017 10:29:16 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Feb 2017 10:29:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 768781A040F for ; Thu, 9 Feb 2017 10:29:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.38 X-Spam-Level: X-Spam-Status: No, score=0.38 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=googlemail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id zs4QBHSeZb1P for ; Thu, 9 Feb 2017 10:29:14 +0000 (UTC) Received: from mail-ua0-f178.google.com (mail-ua0-f178.google.com [209.85.217.178]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 8EDC55FC84 for ; Thu, 9 Feb 2017 10:29:13 +0000 (UTC) Received: by mail-ua0-f178.google.com with SMTP id 96so131118257uaq.3 for ; Thu, 09 Feb 2017 02:29:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=JPNGp9qWTsSXP0US+XCiM30+84AK/JQh4XMqRvRskYc=; b=olEbXV76CcPc37uVJJDI3ANhKfHkRXu2WnKpsS2IFs5IJhd8HXbTfBj6TDvYq7wrO4 1fVKPWTinlYDQ9aqNi8H1ef//6rMendDzeBtxaRLCEcMfjRUVIWlW5AzomlX040RIF9v 83qVKyozjhhZzN2a872Q9BFnXA56eHeyIuF/xqZl37pkFVLa8LMWr5k3YF48Blzyfei7 TSeom34pSbWPgNRCHgXj6uA5X0UvTV3pg9rflaQFDvnUOnR/fjrk7vdB98m5gpBKvibq Ut7g9KdvNZMvKSMX+aTA3JDQpwVLhf76DirFqEdN1YpwCkzfGwCK9FAYBKzxsQaauUiA wWpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=JPNGp9qWTsSXP0US+XCiM30+84AK/JQh4XMqRvRskYc=; b=ufrWOQDQEwlcKBC12ZYIg90nxOMkfni5MqCJVQwiQaXOq3OvvHv8381b3G8RYqIKZb Ra6KgQp+agopGeeIg0k4iNDflzbo2ijpve2FeM9SiSe47r7J3sm692NkVb8XaqkYEfcQ qY0B0evIUsAiwR+IodnI0lFh4PxrdAOEagWhSs2+7iZNDeDmtmuK96pVlaN+Wk1kIjww vxab1WE6leIpc6kylg5sXfjZ/sj93hUHTyu22WvrTQxk8922G5OTDGcWfTtaARCeS+Yq WkMgrMKgC1teoJdvXNjc0wGV4N2vNGfJke5thA2SO3MpzlxgIhj3wPc9pSdviqW01bSP aajg== X-Gm-Message-State: AMke39nwUfEgN+XxbnbjCLDi7BtadEDtGXXQhkH0acEy1g9lCXeDS8mMf0GuMrBSbDbgMJZZ8RTR1tRsPtUQaQ== X-Received: by 10.159.39.72 with SMTP id a66mr1210332uaa.150.1486636152248; Thu, 09 Feb 2017 02:29:12 -0800 (PST) MIME-Version: 1.0 Received: by 10.103.4.71 with HTTP; Thu, 9 Feb 2017 02:29:11 -0800 (PST) In-Reply-To: References: From: Chia-Hung Lin Date: Thu, 9 Feb 2017 11:29:11 +0100 Message-ID: Subject: Re: Deadlock between roll timer and PollingRunner threads To: user@flume.apache.org Content-Type: text/plain; charset=UTF-8 archived-at: Thu, 09 Feb 2017 10:29:18 -0000 Thanks for the information. The maxOpenFiles value I use is the default one (I don't touch that config value in fact). On 8 February 2017 at 15:28, Denes Arvay wrote: > Hi, > > Yes, it seems to be a bug, I also bumped into it. > It seems that the conf file poller detects change in the config file and > tries to stop the components and in the same time HDFS sink tries to roll a > file. > It should be solved by https://issues.apache.org/jira/browse/FLUME-2973 > > From your thread dump it seems that rolling is triggered by the maxOpenFiles > limit, is it overridden in your config file? A very low value could increase > the chances of this deadlock. > > I'd also recommend to use the --no-reload-conf command line parameter if the > live config reload feature is not needed. > > Kind regards, > Denes > > > > On Mon, Feb 6, 2017 at 6:08 PM Chia-Hung Lin wrote: >> >> I use flume 1.6.0 (revision 2561a23240a71ba20bf288c7c2cda88f443c2080) >> for testing to move files from local file system to s3. Only a flume >> process is launched (a single jvm process). The problem is each time a >> deadlock occurs between roll timer and PollingRunner threads after >> running a while. A thread dumps is shown as below: >> >> "hdfs-sk-roll-timer-0": >> waiting to lock monitor 0x00007f46c40b5578 (object >> 0x00000000e002dc90, a java.lang.Object), >> which is held by "SinkRunner-PollingRunner-DefaultSinkProcessor" >> "SinkRunner-PollingRunner-DefaultSinkProcessor": >> waiting to lock monitor 0x00007f4684004db8 (object >> 0x00000000e17b64d8, a org.apache.flume.sink.hdfs.BucketWriter), >> which is held by "hdfs-sk-roll-timer-0" >> >> Java stack information for the threads listed above: >> =================================================== >> "hdfs-sk-roll-timer-0": >> at >> org.apache.flume.sink.hdfs.HDFSEventSink$1.run(HDFSEventSink.java:396) >> - waiting to lock <0x00000000e002dc90> (a java.lang.Object) >> at >> org.apache.flume.sink.hdfs.BucketWriter.runCloseAction(BucketWriter.java:447) >> at >> org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:408) >> - locked <0x00000000e17b64d8> (a >> org.apache.flume.sink.hdfs.BucketWriter) >> at >> org.apache.flume.sink.hdfs.BucketWriter$2.call(BucketWriter.java:280) >> at >> org.apache.flume.sink.hdfs.BucketWriter$2.call(BucketWriter.java:274) >> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> "SinkRunner-PollingRunner-DefaultSinkProcessor": >> at >> org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:304) >> - waiting to lock <0x00000000e17b64d8> (a >> org.apache.flume.sink.hdfs.BucketWriter) >> at >> org.apache.flume.sink.hdfs.HDFSEventSink$WriterLinkedHashMap.removeEldestEntry(HDFSEventSink.java:163) >> at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:431) >> at java.util.HashMap.put(HashMap.java:505) >> at >> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:407) >> - locked <0x00000000e002dc90> (a java.lang.Object) >> at >> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) >> at >> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) >> at java.lang.Thread.run(Thread.java:745) >> >> Found 1 deadlock. >> >> The setting is below: >> >> a1.sources = src >> a1.sinks = sk >> a1.channels = ch >> ... >> a1.sinks.sk.type = hdfs >> a1.sinks.sk.channel = ch >> ... >> a1.sinks.sk.hdfs.fileType = DataStream >> ... >> a1.sinks.k1.hdfs.rollCount = 0 >> a1.sinks.k1.hdfs.rollSize = 0 >> a1.sinks.k1.hdfs.rollInterval = 100 >> ... >> a1.channels.ch.type = file >> a1.channels.ch.checkpointDir = /path/to/chechkpointDir >> a1.channels.ch.dataDirs = /path/to/dataDir >> >> The command to run flume is >> >> nohup ./bin/flume-ng agent --conf conf/ --conf-file test.conf --name >> a1 ... > /path/to/test.log 2 >&1 & >> >> Is this a bug or something I can tune to fix it? >> >> Thanks