flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chia-Hung Lin <cli...@googlemail.com>
Subject Re: Deadlock between roll timer and PollingRunner threads
Date Thu, 09 Feb 2017 10:29:11 GMT
Thanks for the information. The maxOpenFiles value I use is the
default one (I don't touch that config value in fact).

On 8 February 2017 at 15:28, Denes Arvay <denes@cloudera.com> wrote:
> Hi,
>
> Yes, it seems to be a bug, I also bumped into it.
> It seems that the conf file poller detects change in the config file and
> tries to stop the components and in the same time HDFS sink tries to roll a
> file.
> It should be solved by https://issues.apache.org/jira/browse/FLUME-2973
>
> From your thread dump it seems that rolling is triggered by the maxOpenFiles
> limit, is it overridden in your config file? A very low value could increase
> the chances of this deadlock.
>
> I'd also recommend to use the --no-reload-conf command line parameter if the
> live config reload feature is not needed.
>
> Kind regards,
> Denes
>
>
>
> On Mon, Feb 6, 2017 at 6:08 PM Chia-Hung Lin <clin4j@googlemail.com> wrote:
>>
>> I use flume 1.6.0 (revision 2561a23240a71ba20bf288c7c2cda88f443c2080)
>> for testing to move files from local file system to s3. Only a flume
>> process is launched (a single jvm process). The problem is each time a
>> deadlock occurs between roll timer and PollingRunner threads after
>> running a while. A thread dumps is shown as below:
>>
>> "hdfs-sk-roll-timer-0":
>>   waiting to lock monitor 0x00007f46c40b5578 (object
>> 0x00000000e002dc90, a java.lang.Object),
>>   which is held by "SinkRunner-PollingRunner-DefaultSinkProcessor"
>> "SinkRunner-PollingRunner-DefaultSinkProcessor":
>>   waiting to lock monitor 0x00007f4684004db8 (object
>> 0x00000000e17b64d8, a org.apache.flume.sink.hdfs.BucketWriter),
>>   which is held by "hdfs-sk-roll-timer-0"
>>
>> Java stack information for the threads listed above:
>> ===================================================
>> "hdfs-sk-roll-timer-0":
>>         at
>> org.apache.flume.sink.hdfs.HDFSEventSink$1.run(HDFSEventSink.java:396)
>>         - waiting to lock <0x00000000e002dc90> (a java.lang.Object)
>>         at
>> org.apache.flume.sink.hdfs.BucketWriter.runCloseAction(BucketWriter.java:447)
>>         at
>> org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:408)
>>         - locked <0x00000000e17b64d8> (a
>> org.apache.flume.sink.hdfs.BucketWriter)
>>         at
>> org.apache.flume.sink.hdfs.BucketWriter$2.call(BucketWriter.java:280)
>>         at
>> org.apache.flume.sink.hdfs.BucketWriter$2.call(BucketWriter.java:274)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>         at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>         at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>> "SinkRunner-PollingRunner-DefaultSinkProcessor":
>>         at
>> org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:304)
>>         - waiting to lock <0x00000000e17b64d8> (a
>> org.apache.flume.sink.hdfs.BucketWriter)
>>         at
>> org.apache.flume.sink.hdfs.HDFSEventSink$WriterLinkedHashMap.removeEldestEntry(HDFSEventSink.java:163)
>>         at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:431)
>>         at java.util.HashMap.put(HashMap.java:505)
>>         at
>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:407)
>>         - locked <0x00000000e002dc90> (a java.lang.Object)
>>         at
>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>         at
>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>         at java.lang.Thread.run(Thread.java:745)
>>
>> Found 1 deadlock.
>>
>> The setting is below:
>>
>> a1.sources = src
>> a1.sinks = sk
>> a1.channels = ch
>> ...
>> a1.sinks.sk.type = hdfs
>> a1.sinks.sk.channel = ch
>> ...
>> a1.sinks.sk.hdfs.fileType = DataStream
>> ...
>> a1.sinks.k1.hdfs.rollCount = 0
>> a1.sinks.k1.hdfs.rollSize = 0
>> a1.sinks.k1.hdfs.rollInterval = 100
>> ...
>> a1.channels.ch.type = file
>> a1.channels.ch.checkpointDir = /path/to/chechkpointDir
>> a1.channels.ch.dataDirs = /path/to/dataDir
>>
>> The command to run flume is
>>
>> nohup ./bin/flume-ng agent --conf conf/ --conf-file test.conf --name
>> a1 ... > /path/to/test.log 2 >&1 &
>>
>> Is this a bug or something I can tune to fix it?
>>
>> Thanks

Mime
View raw message