flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ritesh Adval <riteshad...@gaikai.com>
Subject Re: Flume File Channel Filling Up The Disk With Transaction Log, Any Way To Prevent It
Date Mon, 25 Nov 2013 23:38:30 GMT
We have one agent each for event
And metric and we have 3 hops where
These goes through (rack, cluster and zone)  so we run these 2 agents together running on
each hop. (total 6 agents, 2 in each VM)

Is running single agent per VM recommend ?

-Ritesh



On Nov 25, 2013, at 3:23 PM, Jeff Lord <jlord@cloudera.com> wrote:

> Its fine to run in a VM.
> Out of curiosity why are you running two agents on the machine though?
> 
> 
> 
> On Mon, Nov 25, 2013 at 1:54 PM, Brock Noland <brock@cloudera.com> wrote:
> It the channel is full your clients will get a rejection notice.
> 
> Capacity planning on the FC is a mix between event size, channel size,
> and disk size. If flume is holding on to the logs, it's because it
> needs them.  If you are constantly running out of space, then yes,
> it's quite likely decreasing channel capacity is a logical course of
> action.
> 
> Brock
> 
> On Mon, Nov 25, 2013 at 3:30 PM, Ritesh Adval <riteshadval@gaikai.com> wrote:
> > Thanks but if it keeps any tx log which have events in channel, then it
> > seems it would go out of diskspace, since our clients will keep sending
> > events to it and it will keep creating those tx logs till it has diskspace?
> > Or Am I missing something here?
> >
> > what we need is the client to start getting meesage rejection if the flume
> > agent file channel has reached its limit in terms of pending messages in tx
> > logs or capacity.  Do you think we should reduce the channel capacity,
> > currently it is set to 1M
> >
> >
> > Ritesh
> >
> >
> >
> >
> >
> >
> > On Mon, Nov 25, 2013 at 1:00 PM, Brock Noland <brock@cloudera.com> wrote:
> >>
> >> It will keep any tx log that has a corresponding event in the channel
> >> + 2 per data directory.
> >>
> >> On Mon, Nov 25, 2013 at 2:55 PM, Ritesh Adval <riteshadval@gaikai.com>
> >> wrote:
> >> > Thanks but we do not know how many transaction log files it will create,
> >> > so
> >> > it may go out of disk space even if we set lower maxFileSize.  Do we
> >> > know
> >> > how many max log files it will keep in flume 1.4 ?
> >> >
> >> > Ritesh
> >> >
> >> >
> >> >
> >> >
> >> > On Mon, Nov 25, 2013 at 12:50 PM, Brock Noland <brock@cloudera.com>
> >> > wrote:
> >> >>
> >> >> Lower the maxFileSize.
> >> >>
> >> >> On Mon, Nov 25, 2013 at 2:41 PM, Ritesh Adval <riteshadval@gaikai.com>
> >> >> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > We are running two flume 1.4  agents each with 2 file channel
on a VM
> >> >> > of
> >> >> > size 15GB.
> >> >> >
> >> >> > Is VM recommded to run flume or do we need bare metal boxes?
> >> >> >
> >> >> >
> >> >> > Every week or so we are running into situation where due to our
sinks
> >> >> > on
> >> >> > these agents not able to send message to upstream agents, the
flume
> >> >> > file
> >> >> > channels get filled with large transaction logs.
> >> >> >
> >> >> > Here is what we see on 4 channels :
> >> >> >
> >> >> > $ du -h /srv/flume/
> >> >> > 4.9G    /srv/flume/metricChannel1-Cluster/data
> >> >> > 7.7M    /srv/flume/metricChannel1-Cluster/checkpoint
> >> >> > 4.9G    /srv/flume/metricChannel1-Cluster
> >> >> > 4.9G    /srv/flume/metricChannel2-Cluster/data
> >> >> > 7.7M    /srv/flume/metricChannel2-Cluster/checkpoint
> >> >> > 4.9G    /srv/flume/metricChannel2-Cluster
> >> >> > 214M    /srv/flume/eventChannel2-Cluster/data
> >> >> > 7.7M    /srv/flume/eventChannel2-Cluster/checkpoint
> >> >> > 222M    /srv/flume/eventChannel2-Cluster
> >> >> > 215M    /srv/flume/eventChannel1-Cluster/data
> >> >> > 7.7M    /srv/flume/eventChannel1-Cluster/checkpoint
> >> >> > 223M    /srv/flume/eventChannel1-Cluster
> >> >> > 11G     /srv/flume/
> >> >> >
> >> >> >
> >> >> > Here is an example of tx logs on metricChannel1, we are seeing
5 log
> >> >> > files.
> >> >> > Is there
> >> >> > a way to restrict the number of log files kept? I think in older
> >> >> > version
> >> >> > of
> >> >> > flume it was max 2 log files but we are seeing more than 2 as
shown
> >> >> > below:
> >> >> >
> >> >> >
> >> >> >  $ ls -l /srv/flume/metricChannel1-Cluster/data/
> >> >> > total 4.5G
> >> >> > -rw-r--r-- 1 flume flume    0 Nov 23 00:39 in_use.lock
> >> >> > -rw-r--r-- 1 flume flume 1.1G Nov 23 11:11 log-1
> >> >> > -rw-r--r-- 1 flume flume   47 Nov 24 21:14 log-1.meta
> >> >> > -rw-r--r-- 1 flume flume 1.1G Nov 23 21:18 log-2
> >> >> > -rw-r--r-- 1 flume flume   47 Nov 24 21:14 log-2.meta
> >> >> > -rw-r--r-- 1 flume flume 1.1G Nov 24 07:13 log-3
> >> >> > -rw-r--r-- 1 flume flume   47 Nov 24 21:14 log-3.meta
> >> >> > -rw-r--r-- 1 flume flume 1.1G Nov 24 17:08 log-4
> >> >> > -rw-r--r-- 1 flume flume   47 Nov 24 21:14 log-4.meta
> >> >> > -rw-r--r-- 1 flume flume 425M Nov 24 21:15 log-5
> >> >> > -rw-r--r-- 1 flume flume   47 Nov 24 21:14 log-5.meta
> >> >> >
> >> >> >
> >> >> > we have set maxFileSize to 1GB  and it looks like each tx log
is
> >> >> > within
> >> >> > that
> >> >> > limit and capacity on file channel to 1M message
> >> >> >
> >> >> > agent.channels.metricChannel2.transactionCapacity=1000
> >> >> > agent.channels.metricChannel2.capacity=1000000
> >> >> > agent.channels.metricChannel2.maxFileSize=1073741824
> >> >> >
> >> >> >
> >> >> > What we want to avoid is transaction log filling up the disk,
 Is
> >> >> > there
> >> >> > a
> >> >> > way to achieve this.
> >> >> > We are ok to discard the message.
> >> >> >
> >> >> > Thanks
> >> >> > Ritesh
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
> >
> >
> 
> 
> 
> --
> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
> 

Mime
View raw message