flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Zupan <mike.zu...@manage.com>
Subject Re: Flume 1.4 High CPU
Date Wed, 15 Oct 2014 21:51:21 GMT
Ahmed,

I’m pretty new to hadoop so I’m trying my best to debug this so I can’t pull the events
yet.

We are on 15k disks across the board but your uncompress then compress led me to what I think
is the right track. I’m going to try to send to the flume servers un-compressed and see
if that helps. We are getting a lot of cpu wait when new files come in.

For example

Cpu0  : 14.8%us, 15.1%sy,  0.0%ni,  2.0%id, 65.1%wa,  0.0%hi,  3.0%si,  0.0%st
Cpu1  :  4.0%us, 39.7%sy,  0.0%ni, 34.0%id, 22.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.3%us, 97.4%sy,  0.0%ni,  2.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu3  :  2.3%us, 75.5%sy,  0.0%ni, 13.2%id,  8.9%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  1.3%us, 51.8%sy,  0.0%ni, 30.9%id, 15.9%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.0%us, 99.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu7  :  4.0%us, 40.7%sy,  0.0%ni, 41.7%id, 13.6%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu8  :  0.3%us, 99.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu9  :  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu10 :  0.0%us, 99.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu11 :  2.0%us, 72.0%sy,  0.0%ni,  4.0%id, 22.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu12 :  5.0%us, 33.3%sy,  0.0%ni, 26.3%id, 35.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu13 :  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu14 :  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu15 :  0.0%us, 99.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu16 :  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st


Thanks  

--  
Mike Zupan


On Wednesday, October 15, 2014 at 2:27 PM, Ahmed Vila wrote:

> Hi Mike,
>  
> It would be really helpful to provide number of events entering the source.
>  
> Also, provided CPU utilization from top, the line that breaks down utilization by user/system/iowait/idle.
> If it has higher iowait then it might be that channel is utilizing more IO than your
storage can handle - especially if it's an NFS or iSCSI mount.
> But, the most dependent factor is number of events.
>  
> I see that you actually un-compress the event on arrival to the source and compress it
back at the sink.
> It's well known that compression/decompression is above all CPU-bound task.
> That might be a problem and reduce flume throughput greatly, especially because you have
4 sinks each doing compression on it's own.
>  
> Regards,
> Ahmed Vila
>  
> On Wed, Oct 15, 2014 at 5:32 PM, Mike Zupan <mike.zupan@manage.com (mailto:mike.zupan@manage.com)>
wrote:
> > I’m seeing issues with flume server using very high amounts of CPU. Just wondering
if this is a common issue with a file channel. I’m pretty new to flume so sorry if this
isn’t enough to debug the issue.
> >  
> > Current top looks like  
> >  
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >  8509 root      20   0 22.0g 8.6g 675m S 1109.4 13.7   1682:45 java
> >  8251 root      20   0 21.9g 8.3g 647m S 1083.5 13.2   1476:27 java
> >  7593 root      20   0 12.4g 8.4g  18m S 1007.5 13.4   1866:18 java
> >  
> > As you can see we have 3 out of 4 flume servers using 1000% cpu.  
> >  
> > Details are
> >  
> > OS: CentOS 6.5
> > Java: Oracle "1.7.0_45"
> >  
> > Flume: flume-1.4.0.2.1.1.0-385.el6.noarch
> >  
> > Our config for the server looks like this
> >  
> > ###############################################
> > # Agent configuration for transactional data
> > ###############################################
> > nontx_host07_agent01.sources = avro
> > nontx_host07_agent01.channels = fc
> > nontx_host07_agent01.sinks = hdfs_sink_01 hdfs_sink_02 hdfs_sink_03 hdfs_sink_04
> >  
> > ##################################################
> > # info is published to port 9991
> > ##################################################
> > nontx_host07_agent01.sources.avro.type = avro
> > nontx_host07_agent01.sources.avro.bind = 0.0.0.0
> > nontx_host07_agent01.sources.avro.port = 9991
> > nontx_host07_agent01.sources.avro.threads = 100
> > nontx_host07_agent01.sources.avro.compression-type = deflate
> > nontx_host07_agent01.sources.avro.interceptors = ts id
> > nontx_host07_agent01.sources.avro.interceptors.ts.type = timestamp
> > nontx_host07_agent01.sources.avro.interceptors.ts.preserveExisting = false
> > nontx_host07_agent01.sources.avro.interceptors.id.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
> > nontx_host07_agent01.sources.avro.interceptors.id.preserveExisting = true
> >  
> >  
> > ##################################################
> > # The Channels
> > ##################################################
> > nontx_host07_agent01.channels.fc.type = file
> > nontx_host07_agent01.channels.fc.checkpointDir = /flume/channels/checkpoint/nontx_host07_agent01
> > nontx_host07_agent01.channels.fc.dataDirs = /flume/channels/data/nontx_host07_agent01
> > nontx_host07_agent01.channels.fc.capacity = 140000000
> > nontx_host07_agent01.channels.fc.transactionCapacity = 240000
> >  
> > ##################################################
> > # Sinks
> > ##################################################
> > nontx_host07_agent01.sinks.hdfs_sink_01.type = hdfs
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.path = hdfs://cluster01:8020/flume/%{log_type}
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.filePrefix = flume_nontx_host07_agent01_sink01_%Y%m%d%H
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.inUsePrefix=_
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.inUseSuffix=.tmp
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.fileType = CompressedStream
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.codeC = snappy
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.rollSize = 0
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.rollCount = 0
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.rollInterval = 300
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.idleTimeout = 30
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.timeZone = America/Los_Angeles
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.callTimeout = 30000
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.batchSize = 50000
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.round = true
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.roundUnit = minute
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.roundValue = 5
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.threadsPoolSize = 2
> > nontx_host07_agent01.sinks.hdfs_sink_01.serializer = com.manage.flume.serialization.HeaderAndBodyJsonEventSerializer$Builder
> >  
> >  
> > --  
> > Mike Zupan
> >  
>  
>  
>  
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended recipient(s) only.
This email contains confidential information. It should not be copied, disclosed to, retained
or used by, any party other than the intended recipient. Any unauthorised distribution, dissemination
or copying of this E-mail or its attachments, and/or any use of any information contained
in them, is strictly prohibited and may be illegal. If you are not an intended recipient then
please promptly delete this e-mail and any attachment and all copies and inform the sender
directly via email. Any emails that you send to us may be monitored by systems or persons
other than the named communicant for the purposes of ascertaining whether the communication
complies with the law and company policies.  


Mime
View raw message