flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: process failed - java.lang.OutOfMemoryError
Date Sat, 02 Mar 2013 17:30:50 GMT
Try turning on HeapDumpOnOutOfMemoryError so we can peek at the heap dump.  

-- 
Brock Noland
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, March 1, 2013 at 5:57 PM, Denis Lowe wrote:

> process failed - java.lang.OutOfMemoryError
> 
> We observed the following error:
> 01 Mar 2013 21:37:24,807 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:460)
 - process failed
> java.lang.OutOfMemoryError
>         at org.apache.hadoop.io.compress.zlib.ZlibCompressor.init(Native Method)
>         at org.apache.hadoop.io.compress.zlib.ZlibCompressor.<init>(ZlibCompressor.java:222)
>         at org.apache.hadoop.io.compress.GzipCodec$GzipZlibCompressor.<init>(GzipCodec.java:159)
>         at org.apache.hadoop.io.compress.GzipCodec.createCompressor(GzipCodec.java:109)
>         at org.apache.hadoop.io.compress.GzipCodec.createOutputStream(GzipCodec.java:92)
>         at org.apache.flume.sink.hdfs.HDFSCompressedDataStream.open(HDFSCompressedDataStream.java:70)
>         at org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:216)
>         at org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:53)
>         at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:172)
>         at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:170)
>         at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
>         at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:170)
>         at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:364)
>         at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
>         at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:722)
> 
> Unfortunately the error does not state if it is because of lack of Heap, Perm or Direct
Memory?
> 
> Looking at the system memory we could see that we were using 3GB of 7GB (ie less than
half of the physical memory was used) 
> 
> Using VisualVM profiler we could see that we had not maxed out the Heap Memory 75MB of
131MB (allocated)
> PermGen was fine 16MB of 27MB (allocated)
> 
> Buffer Usage is as follows: 
> Direct Memory:
> < 50MB (this gets freed after each GC)
> 
> Mapped Memory:
> count 9
> 144MB (always stays constant)
> 
> I'm assuming the -XX:MaxDirectMemorySize is for Direct Buffer Memory usage NOT Mapped
buffer Memory? 
> 
> The other thing we noticed was that after restart the flume process "RES" size starts
at around 200MB and then over a period of a week will grow up to 3GB after which we observed
the above error. 
> Unfortunately we cannot see where this 3GB of memory is being used when profiled with
VisualVM and JConsole (max heap size is set to 256MB) - there definitely appears to be a slow
memory leak?
> 
> Flume is the only process running on this server:
> 64bit Centos
> java version "1.6.0_27" (64bit)
> 
> The flume collector is configured with 8 file channels writing to S3 using the HDFS sink.
(8 upstream servers a pushing events to 2 downsteam collectors) 
> 
> Each of the 8 channels/sinks is configured as follows:
> ## impression source
> agent.sources.impressions.type = avro
> agent.sources.impressions.bind = 0.0.0.0
> agent.sources.impressions.port = 5001
> agent.sources.impressions.channels = impressions-s3-channel
> ## impression  channel
> agent.channels.impressions-s3-channel.type = file
> agent.channels.impressions-s3-channel.checkpointDir = /mnt/flume-ng/checkpoint/impressions-s3-channel
> agent.channels.impressions-s3-channel.dataDirs = /mnt/flume-ng/data1/impressions-s3-channel,/mnt/flume-ng/data2/impressions-s3-channel
> agent.channels.impressions-s3-channel.maxFileSize = 210000000
> agent.channels.impressions-s3-channel.capacity = 2000000
> agent.channels.impressions-s3-channel.checkpointInterval = 300000
> agent.channels.impressions-s3-channel.transactionCapacity = 10000
> # impression s3 sink
> agent.sinks.impressions-s3-sink.type = hdfs
> agent.sinks.impressions-s3-sink.channel = impressions-s3-channel
> agent.sinks.impressions-s3-sink.hdfs.path = s3n://KEY:SECRET_KEY@S3-PATH
> agent.sinks.impressions-s3-sink.hdfs.filePrefix = impressions-%{collector-host}
> agent.sinks.impressions-s3-sink.hdfs.callTimeout = 0
> agent.sinks.impressions-s3-sink.hdfs.rollInterval = 3600
> agent.sinks.impressions-s3-sink.hdfs.rollSize = 450000000
> agent.sinks.impressions-s3-sink.hdfs.rollCount = 0
> agent.sinks.impressions-s3-sink.hdfs.codeC = gzip
> agent.sinks.impressions-s3-sink.hdfs.fileType = CompressedStream
> agent.sinks.impressions-s3-sink.hdfs.batchSize = 100
> 
> I am using flume-ng 1.3.1 with the following parameters: 
> JAVA_OPTS="-Xms64m -Xmx256m -Xss128k -XX:MaxDirectMemorySize=256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc -Xloggc:/mnt/logs/flume-ng/gc.log"
> 
> We have 2 collectors running and they both fail at pretty much the same time.
> 
> So from what i can see there appears to be a slow memory leak with the HDFS sink, but
have no idea how track this down or what alternate configuration i can use to prevent this
from happening again? 
> 
> Any ideas would be greatly appreciated?
> 


Mime
View raw message