flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Shreedharan" <hshreedha...@cloudera.com>
Subject RE: File channels creating many large files
Date Mon, 10 Nov 2014 09:14:32 GMT
That value is in bytes. At 500k, you will likely end up with too many files. You should set
it as high as you can.



Thanks,
Hari

On Mon, Nov 10, 2014 at 1:05 AM, Needham, Guy
<Guy.Needham@virginmedia.co.uk> wrote:

> Hari, Jeff,
> thanks for your replies. It's Flume 1.5.0, I'll use the maxFileSize parameter to fix
this. Is there any impact on channel optimisation from setting it to say 500000?
> Regards,
> Guy Needham | Data Discovery
> Virgin Media | Enterprise Data, Design & Management
> Bartley Wood Business Park, Hook, Hampshire RG27 9UP
> D 01256 75 3362
> I welcome VSRE emails. Learn more at http://vsre.info/
> ________________________________
> From: Hari Shreedharan [mailto:hshreedharan@cloudera.com]
> Sent: 07 November 2014 17:59
> To: user@flume.apache.org
> Cc: user@flume.apache.org
> Subject: Re: File channels creating many large files
> Flume will leave at least 2 files per data directory. Once you have enough events to
cause 2 files to be created, there will be at least 2 per dir. You can use maxFileSize parameter
to control the size of these files.
> Thanks, Hari
> On Fri, Nov 7, 2014 at 10:25 AM, Jeff Lord <jlord@cloudera.com<mailto:jlord@cloudera.com>>
wrote:
> Guy,
> What version of flume is this?
> -Jeff
> On Fri, Nov 7, 2014 at 1:19 AM, Needham, Guy <Guy.Needham@virginmedia.co.uk<mailto:Guy.Needham@virginmedia.co.uk>>
wrote:
> Hi all,
> I have a configuration with a file channel configured such that:
> a1.channels.ch1.type = file
> a1.channels.ch1.checkpointDir = /hadoop/user/flume/channels/checkpoint
> a1.channels.ch1.dataDirs = /hadoop/user/flume/channels/data
> a1.channels.ch1.capacity = 100000
> a1.channels.ch1.transactionCapacity = 5000
> It's been running since October 28th with no issues, but when I looked today in /hadoop/user/flume/channels/data
I saw that the file channel was building up large files which had been processed and not deleting
them:
> [rdd@hadoop-kn-p2-m01 flume]$ ls -lh channels/data/
> total 1.6G
> -rw-r----- 1 rdd rdd 1.5G Oct 28 16:10 log-1
> -rw-r----- 1 rdd rdd   47 Oct 28 16:10 log-1.meta
> -rw-r----- 1 rdd rdd  72M Oct 31 16:28 log-2
> -rw-r----- 1 rdd rdd   47 Oct 31 16:29 log-2.meta
> It seems like for each day that data landed (we're still in testing so data not landing
constantly) a data file has been created but not deleted when reading was completed.
> Is this expected behaviour? Is there a way to stop large files building up and still
use the file channel?
> Regards,
> Guy Needham | Data Discovery
> Virgin Media | Enterprise Data, Design & Management
> Bartley Wood Business Park, Hook, Hampshire RG27 9UP
> D 01256 75 3362
> I welcome VSRE emails. Learn more at http://vsre.info/
> --------------------------------------------------------------------
> Save Paper - Do you really need to print this e-mail?
> Visit www.virginmedia.com<http://www.virginmedia.com> for more information, and
more fun.
> This email and any attachments are or may be confidential and legally privileged
> and are sent solely for the attention of the addressee(s). If you have received this
> email in error, please delete it from your system: its use, disclosure or copying is
> unauthorised. Statements and opinions expressed in this email may not represent
> those of Virgin Media. Any representations or commitments in this email are
> subject to contract.
> Registered office: Media House, Bartley Wood Business Park, Hook, Hampshire, RG27 9UP
> Registered in England and Wales with number 2591237
> --------------------------------------------------------------------
> Save Paper - Do you really need to print this e-mail?
> Visit www.virginmedia.com for more information, and more fun.
> This email and any attachments are or may be confidential and legally privileged
> and are sent solely for the attention of the addressee(s). If you have received this
> email in error, please delete it from your system: its use, disclosure or copying is
> unauthorised. Statements and opinions expressed in this email may not represent
> those of Virgin Media. Any representations or commitments in this email are
> subject to contract. 
> Registered office: Media House, Bartley Wood Business Park, Hook, Hampshire, RG27 9UP
> Registered in England and Wales with number 2591237
Mime
View raw message