flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Driscoll <timothy.drisc...@gmail.com>
Subject Re: HDFSEventSink Memory Leak Workarounds
Date Wed, 22 May 2013 16:20:14 GMT
Sounds like the expected behavior to me based on the message, though it's a
little confusing because it's caught in an IOException.

Somewhat related, we had our idleTimeout probably set too low, so the files
would close pretty often.  This was causing a memory leak for us, from what
I can tell this is due to FLUME-1864.  So I think it may be a good idea to
bump up the idleTimeout if you're constantly closing idle files.  I could
be wrong though, I would defer to the developers. :)


On Wed, May 22, 2013 at 8:58 AM, Paul Chavez <
pchavez@verticalsearchworks.com> wrote:

> **
> This thread reminded me to check my configs since I use a low idleTimeout
> and bucket events by hour. Turned out I still had the default rollInterval
> set so I disabled that and updated my configs.
>
> Now I see a log of exceptions logged as warnings in the log immediately
> following an idleTimeout:
>
> 8:55:40.663 AM INFO org.apache.flume.sink.hdfs.BucketWriter
> Closing idle bucketWriter
> /flume/WebLogs/datekey=20130522/hour=08/FlumeData.1369238128886.tmp
> 8:55:40.675 AM INFO org.apache.flume.sink.hdfs.BucketWriter
> Renaming
> /flume/WebLogs/datekey=20130522/hour=08/FlumeData.1369238128886.tmp to
> /flume/WebLogs/datekey=20130522/hour=08/FlumeData.1369238128886
> 8:55:40.677 AM WARN org.apache.flume.sink.hdfs.HDFSEventSink
> HDFS IO error
> java.io.IOException: This bucket writer was closed due to idling and this
> handle is thus no longer valid
>  at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:391)
>  at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
>  at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>  at java.lang.Thread.run(Thread.java:662)
>
> Given these are logged WARN I have been assuming they are benign errors.
> Is that assumption correct?
>
> thanks,
> Paul Chavez
>
>  ------------------------------
> *From:* Connor Woodson [mailto:cwoodson.dev@gmail.com]
> *Sent:* Tuesday, May 21, 2013 2:13 PM
> *To:* user@flume.apache.org
> *Subject:* Re: HDFSEventSink Memory Leak Workarounds
>
>  The other property you will want to look at is maxOpenFiles, which is
> the number of file/paths held in memory at one time.
>
> If you search for the email thread with subject "hdfs.idleTimeout ,what's
> it used for ?" from back in January you will find a discussion along these
> lines. As a quick summary, if rollInterval is not set to 0, you should
> avoid using idleTimeout and should set maxOpenFiles to a reasonable number
> (the default is 500 which is too large; I think that default is changed for
> 1.4).
>
> - Connor
>
>
> On Tue, May 21, 2013 at 9:59 AM, Tim Driscoll <timothy.driscoll@gmail.com>wrote:
>
>> Hello,
>>
>> We have a Flume Agent (version 1.3.1) set up using the HDFSEventSink.  We
>> were noticing that we were running out of memory after a few days of
>> running, and believe we had pinpointed it to an issue with using the
>> hdfs.idleTimeout setting.  I believe this is fixed in 1.4 per FLUME-1864.
>>
>> Our planned workaround was to just remove the idleTimeout setting, which
>> worked, but brought up another issue.  Since we are partitioning our data
>> by timestamp, at midnight, we rolled over to a new bucket/partition, opened
>> new bucket writers, and left the current bucket writers open.  Ideally the
>> idleTimeout would clean this up.  So instead of a slow steady leak, we're
>> encountering a 100MB leak every day.
>>
>> Short of upgrading Flume, does anyone know of a configuration workaround
>> for this?  Currently we just bumped up the heap memory and I'm having to
>> restart our agents every few days, which obviously isn't ideal.
>>
>> Is anyone else seeing issues like this?  Or how do others use the HDFS
>> sink to continuously write large amounts of logs from multiple source
>> hosts?  I can get more in-depth about our setup/environment if necessary.
>>
>> Here's a snippet of the one of  our 4 HDFS Sink configs:
>> agent.sinks.rest-xaction-hdfs-sink.type = hdfs
>> agent.sinks.rest-xaction-hdfs-sink.channel = rest-xaction-chan
>> agent.sinks.rest-xaction-hdfs-sink.hdfs.path =
>> /user/svc-neb/rest_xaction_logs/date=%Y-%m-%d
>> agent.sinks.rest-xaction-hdfs-sink.hdfs.rollCount = 0
>> agent.sinks.rest-xaction-hdfs-sink.hdfs.rollSize = 0
>> agent.sinks.rest-xaction-hdfs-sink.hdfs.rollInterval = 3600
>> agent.sinks.rest-xaction-hdfs-sink.hdfs.idleTimeout = 300
>> agent.sinks.rest-xaction-hdfs-sink.hdfs.batchSize = 1000
>> agent.sinks.rest-xaction-hdfs-sink.hdfs.filePrefix = %{host}
>> agent.sinks.rest-xaction-hdfs-sink.hdfs.fileSuffix = .avro
>> agent.sinks.rest-xaction-hdfs-sink.hdfs.fileType = DataStream
>> agent.sinks.rest-xaction-hdfs-sink.serializer = avro_event
>>
>> -Tim
>>
>
>

Mime
View raw message