flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From iain wright <iainw...@gmail.com>
Subject Re: Flume consumes all memory - { OutOfMemoryError: GC overhead limit exceeded }
Date Wed, 26 Jul 2017 21:24:14 GMT
Config seems sane.

if you ps auxww|grep -i flume, do you see the java process started with
your Xms/Xmx flags?

I increased the heap & added jmx by adding this to flume-env.sh in the
flume conf dir:
JAVA_OPTS="-Xms2048m -Xmx3072m -Dcom.sun.management.jmxremote"

If you enable jmx you can get some more info about the heap allocation/do a
heap dump also with jvisualvm. It seems most likely those flags aren't
getting to the jvm

wrt. Monitoring, you can add -Dflume.monitoring.type=HTTP
-Dflume.monitoring.port=34548 to expose the metrics endpoint on http://
<host>:34548/metrics





-- 
Iain Wright

This email message is confidential, intended only for the recipient(s)
named above and may contain information that is privileged, exempt from
disclosure under applicable law. If you are not the intended recipient, do
not disclose or disseminate the message to anyone except the intended
recipient. If you have received this message in error, or are not the named
recipient(s), please immediately notify the sender by return email, and
delete all copies of this message.

On Wed, Jul 26, 2017 at 1:24 PM, Anantharaman, Srinatha (Contractor) <
Srinatha_Anantharaman@comcast.com> wrote:

> Lain,
>
>
>
> I am using file channel. Source is spoolDir and Sinks are Solr and HDFS
>
> Please find below my Code
>
>
>
> #Flume Configuration Starts
>
>
>
> agent.sources = SpoolDirSrc
>
> agent.channels = Channel1 Channel2
>
> agent.sinks = SolrSink HDFSsink
>
>
>
> # Configure Source
>
>
>
> agent.sources.SpoolDirSrc.channels = Channel1 Channel2
>
> agent.sources.SpoolDirSrc.type = spooldir
>
> #agent.sources.SpoolDirSrc.spoolDir = /app/home/solr/sources_tmp2
>
> #agent.sources.SpoolDirSrc.spoolDir = /app/home/eventsvc/source/
> processed_emails/
>
> agent.sources.SpoolDirSrc.spoolDir = /app/home/eventsvc/source/
> processed_emails2/
>
> agent.sources.SpoolDirSrc.basenameHeader = true
>
> agent.sources.SpoolDirSrc.selector.type = replicating
>
> #agent.sources.SpoolDirSrc.batchSize = 100000
>
>
>
> agent.sources.SpoolDirSrc.fileHeader = true
>
> #agent.sources.src1.fileSuffix = .COMPLETED
>
> agent.sources.SpoolDirSrc.deserializer = org.apache.flume.sink.solr.
> morphline.BlobDeserializer$Builder
>
>
>
>
>
> # Use a channel that buffers events in file
>
> #
>
> agent.channels.Channel1.type = file
>
> agent.channels.Channel2.type = file
>
> agent.channels.Channel1.capacity = 5000
>
> agent.channels.Channel2.capacity = 5000
>
> agent.channels.Channel1.transactionCapacity = 5000
>
> agent.channels.Channel2.transactionCapacity = 5000
>
> agent.channels.Channel1.checkpointDir = /app/home/flume/.flume/file-
> channel/checkpoint1
>
> agent.channels.Channel2.checkpointDir = /app/home/flume/.flume/file-
> channel/checkpoint2
>
> agent.channels.Channel1.dataDirs = /app/home/flume/.flume/file-
> channel/data1
>
> agent.channels.Channel2.dataDirs = /app/home/flume/.flume/file-
> channel/data2
>
>
>
>
>
> #agent.channels.Channel.transactionCapacity = 10000
>
>
>
>
>
> # Configure Solr Sink
>
>
>
> agent.sinks.SolrSink.type = org.apache.flume.sink.solr.
> morphline.MorphlineSolrSink
>
> agent.sinks.SolrSink.morphlineFile = /etc/flume/conf/morphline.conf
>
> agent.sinks.SolrSink.batchsize = 10
>
> agent.sinks.SolrSink.batchDurationMillis = 10
>
> agent.sinks.SolrSink.channel = Channel1
>
> agent.sinks.SolrSink.morphlineId = morphline1
>
> agent.sinks.SolrSink.tika.config = tikaConfig.xml
>
> #agent.sinks.SolrSink.fileType = DataStream
>
> #agent.sinks.SolrSink.hdfs.batchsize = 5
>
> agent.sinks.SolrSink.rollCount = 0
>
> agent.sinks.SolrSink.rollInterval = 0
>
> #agent.sinks.SolrSink.rollsize = 100000000
>
> agent.sinks.SolrSink.idleTimeout = 0
>
> #agent.sinks.SolrSink.txnEventMax = 5000
>
>
>
> # Configure HDFS Sink
>
>
>
> agent.sinks.HDFSsink.channel = Channel2
>
> agent.sinks.HDFSsink.type = hdfs
>
> #agent.sinks.HDFSsink.hdfs.path = hdfs://codehdplak-po-r10p.sys.
> comcast.net:8020/user/solr/emails
>
> agent.sinks.HDFSsink.hdfs.path = hdfs://codehann/user/solr/emails
>
> #agent.sinks.HDFSsink.hdfs.fileType = DataStream
>
> agent.sinks.HDFSsink.hdfs.fileType = CompressedStream
>
> agent.sinks.HDFSsink.hdfs.batchsize = 1000
>
> agent.sinks.HDFSsink.hdfs.rollCount = 0
>
> agent.sinks.HDFSsink.hdfs.rollInterval = 0
>
> agent.sinks.HDFSsink.hdfs.rollsize = 10485760
>
> agent.sinks.HDFSsink.hdfs.idleTimeout = 0
>
> agent.sinks.HDFSsink.hdfs.maxOpenFiles = 1
>
> agent.sinks.HDFSsink.hdfs.filePrefix = %{basename}
>
> agent.sinks.HDFSsink.hdfs.codeC = gzip
>
>
>
>
>
> agent.sources.SpoolDirSrc.channels = Channel1 Channel2
>
> agent.sinks.SolrSink.channel = Channel1
>
> agent.sinks.HDFSsink.channel = Channel2
>
>
>
> Morhphine Code :
>
>
>
>
>
> solrLocator: {
>
>
>
> collection : esearch
>
>
>
> #zkHost : "127.0.0.1:9983"
>
>
>
> #zkHost : "codesolr-as-r1p.sys.comcast.net:2181,codesolr-as-r2p.sys.
> comcast.net:2182"
>
> #zkHost : "codesolr-as-r2p:2181"
>
> zkHost : "codesolr-wc-r1p.sys.comcast.net:2181,codesolr-wc-r2p.sys.
> comcast.net:2181,codesolr-wc-r3p.sys.comcast.net:2181"
>
>
>
> }
>
>
>
> morphlines :
>
> [
>
>
>
>   {
>
>
>
>     id : morphline1
>
>
>
>     importCommands : ["org.kitesdk.**", "org.apache.solr.**"]
>
>
>
>     commands :
>
>     [
>
>
>
>       { detectMimeType { includeDefaultMimeTypes : true } }
>
>
>
>       {
>
>
>
>         solrCell {
>
>
>
>           solrLocator : ${solrLocator}
>
>
>
>           captureAttr : true
>
>
>
>           lowernames : true
>
>
>
>           capture : [_attachment_body, _attachment_mimetype, basename,
> content, content_encoding, content_type, file, meta,text]
>
>
>
>           parsers : [ # { parser : org.apache.tika.parser.txt.TXTParser }
>
>
>
>                     # { parser : org.apache.tika.parser.AutoDetectParser }
>
>                       #{ parser : org.apache.tika.parser.asm.ClassParser }
>
>                       #{ parser : org.gagravarr.tika.FlacParser }
>
>                       #{ parser : org.apache.tika.parser.executable.ExecutableParser
> }
>
>                       #{ parser : org.apache.tika.parser.font.TrueTypeParser
> }
>
>                       #{ parser : org.apache.tika.parser.xml.XMLParser }
>
>                       #{ parser : org.apache.tika.parser.html.HtmlParser }
>
>                       #{ parser : org.apache.tika.parser.image.TiffParser
> }
>
>                       # { parser : org.apache.tika.parser.mail.RFC822Parser
> }
>
>                       #{ parser : org.apache.tika.parser.mbox.MboxParser,
> additionalSupportedMimeTypes : [message/x-emlx] }
>
>                       #{ parser : org.apache.tika.parser.microsoft.OfficeParser
> }
>
>                       #{ parser : org.apache.tika.parser.hdf.HDFParser }
>
>                       #{ parser : org.apache.tika.parser.odf.OpenDocumentParser
> }
>
>                       #{ parser : org.apache.tika.parser.pdf.PDFParser }
>
>                       #{ parser : org.apache.tika.parser.rtf.RTFParser }
>
>                       { parser : org.apache.tika.parser.txt.TXTParser }
>
>                       #{ parser : org.apache.tika.parser.chm.ChmParser }
>
>                     ]
>
>
>
>          fmap : { content : text }
>
>          }
>
>
>
>       }
>
>       { generateUUID { field : id } }
>
>
>
>       { sanitizeUnknownSolrFields { solrLocator : ${solrLocator} } }
>
>
>
>
>
>       { logDebug { format : "output record: {}", args : ["@{}"] } }
>
>
>
>       { loadSolr: { solrLocator : ${solrLocator} } }
>
>
>
>     ]
>
>
>
>   }
>
>
>
> ]
>
>
>
> I am not sure How I can get the flume metrics.
>
> Thank you for looking into it
>
>
>
> Regards,
>
> ~Sri
>
>
>
> *From:* iain wright [mailto:iainwrig@gmail.com]
> *Sent:* Wednesday, July 26, 2017 2:37 PM
> *To:* user@flume.apache.org
> *Subject:* Re: Flume consumes all memory - { OutOfMemoryError: GC
> overhead limit exceeded }
>
>
>
> Hi Sri,
>
>
>
> Are you using a memory channel? What source/sink?
>
>
>
> Can you please paste/link your obfuscated config
>
>
>
> What does the metrics endpoint say in terms of channel size,
> sinkdrainsuccess etc, for the period leading up to the OOM?
>
>
>
> Best,
>
> Iain
>
>
> Sent from my iPhone
>
>
> On Jul 26, 2017, at 8:00 AM, Anantharaman, Srinatha (Contractor) <
> Srinatha_Anantharaman@comcast.com> wrote:
>
> Hi All,
>
>
>
> Though I have mentioned the -Xms and -Xmx  values Flume is consuming all
> memory and failing at the end
>
>
>
> I have tried adding above parameters in command line as below
>
>
>
> a.       /usr/hdp/current/flume-server/bin/flume-ng agent -c
> /etc/flume/conf -f /etc/flume/conf/flumeSolr.conf -n agent
> -Dproperty="-Xms1024m -Xmx4048m"
>
> b.      /usr/hdp/current/flume-server/bin/flume-ng agent -c
> /etc/flume/conf -f /etc/flume/conf/flumeSolr.conf -n agent -Xms1024m
> -Xmx4048m
>
>
>
> And also using flume-env.sh file as below
>
>
>
> export JAVA_OPTS="-Xms2048m -Xmx4048m -Dcom.sun.management.jmxremote
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC"
>
>
>
> I am using HDP 2.5  and flume 1.5.2.2.5
>
>
>
> Kindly let me know how to resolve this issue
>
>
>
> Regards,
>
> ~Sri
>
>

Mime
View raw message