Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3814B17ABA for ; Fri, 7 Nov 2014 17:59:38 +0000 (UTC) Received: (qmail 21213 invoked by uid 500); 7 Nov 2014 17:59:37 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 21152 invoked by uid 500); 7 Nov 2014 17:59:37 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 21142 invoked by uid 99); 7 Nov 2014 17:59:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Nov 2014 17:59:37 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of hshreedharan@cloudera.com designates 209.85.216.54 as permitted sender) Received: from [209.85.216.54] (HELO mail-qa0-f54.google.com) (209.85.216.54) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Nov 2014 17:59:33 +0000 Received: by mail-qa0-f54.google.com with SMTP id u7so2597906qaz.41 for ; Fri, 07 Nov 2014 09:59:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:mime-version:message-id:in-reply-to :references:from:to:cc:subject:content-type; bh=q3/ESshfiuk0+nhdu6TaVhIxuO++HJn+jvSYbq/sU+k=; b=INxOmt7K68i7SaHMCuZuGnX2aPbEehSclYtu3J39xcso3WgSMZxEsM8eQHofBbR9sP fmWewXE0I63GG3LFmJnL51IJvqu/fRY8UfH1w5BD8X614K0fDwwo0pEva6NLojixjE7V y4f6WO7l/NhglHt1OlOAya6aDjRCHE8kqC3zZqtbHwBAojlQdLrdPLLNI6gbSDqTaXSJ sLQ3ocPlYM0ha7ASdkwLTenik40DTF/aBYDs/RHynYlPisK/3avEkUcGNoysSX9LfYnr nr5KjeEm7CFxzjv2eSp7VDybDWe6lwHbdBEeJbMODS5Lhtypq1phu/CXMUiKKfOMfYxM FZ6Q== X-Gm-Message-State: ALoCoQm7eTL6KS50kkM+l8GvMUUznHBaRFyfuJmoYwgx0MCK3vn7k4TtYfz738ajx5jErMar5Rre X-Received: by 10.224.89.69 with SMTP id d5mr19575917qam.84.1415383152585; Fri, 07 Nov 2014 09:59:12 -0800 (PST) Received: from hedwig-6.prd.orcali.com (ec2-54-85-253-252.compute-1.amazonaws.com. [54.85.253.252]) by mx.google.com with ESMTPSA id b67sm8852965qgb.33.2014.11.07.09.59.11 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 07 Nov 2014 09:59:12 -0800 (PST) Date: Fri, 07 Nov 2014 09:59:12 -0800 (PST) X-Google-Original-Date: Fri, 07 Nov 2014 17:59:11 GMT MIME-Version: 1.0 X-Mailer: Nodemailer (0.5.0; +http://www.nodemailer.com/) Message-Id: <1415383151698.7d2e7e63@Nodemailer> In-Reply-To: References: X-Orchestra-Oid: 89A6EA5F-15B7-4D3D-988A-C24B32EE1715 X-Orchestra-Sig: 195c680022c0cad693eb87d25a8df0b96e17687c X-Orchestra-Thrid: TEB56D109-7104-46FE-9747-57A7D6EE5D69_1484104274765429352 X-Orchestra-Thrid-Sig: 2b5bd4b3b0d89d0aa4653605da564216744c2e34 X-Orchestra-Account: bfbb7f2166d466fd9b8fef4dbc6e1abef73a55a8 From: "Hari Shreedharan" To: user@flume.apache.org Cc: user@flume.apache.org Subject: Re: File channels creating many large files Content-Type: multipart/alternative; boundary="----Nodemailer-0.5.0-?=_1-1415383152133" X-Virus-Checked: Checked by ClamAV on apache.org ------Nodemailer-0.5.0-?=_1-1415383152133 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Flume will leave at least 2 files per data directory. Once you have enough = events to cause 2 files to be created, there will be at least 2 per dir. = You can use maxFileSize parameter to control the size of these files. Thanks, Hari On Fri, Nov 7, 2014 at 10:25 AM, Jeff Lord wrote: > Guy, > What version of flume is this=3F > -Jeff > On Fri, Nov 7, 2014 at 1:19 AM, Needham, Guy > wrote: >> Hi all, >> >> I have a configuration with a file channel configured such that: >> >> a1.channels.ch1.type =3D file >> a1.channels.ch1.checkpointDir =3D /hadoop/user/flume/channels/checkpoint= >> a1.channels.ch1.dataDirs =3D /hadoop/user/flume/channels/data >> a1.channels.ch1.capacity =3D 100000 >> a1.channels.ch1.transactionCapacity =3D 5000 >> >> It's been running since October 28th with no issues, but when I looked >> today in /hadoop/user/flume/channels/data I saw that the file channel = was >> building up large files which had been processed and not deleting them: >> >> [rdd@hadoop-kn-p2-m01 flume]$ ls -lh channels/data/ >> total 1.6G >> -rw-r----- 1 rdd rdd 1.5G Oct 28 16:10 log-1 >> -rw-r----- 1 rdd rdd 47 Oct 28 16:10 log-1.meta >> -rw-r----- 1 rdd rdd 72M Oct 31 16:28 log-2 >> -rw-r----- 1 rdd rdd 47 Oct 31 16:29 log-2.meta >> It seems like for each day that data landed (we're still in testing so >> data not landing constantly) a data file has been created but not = deleted >> when reading was completed. >> Is this expected behaviour=3F Is there a way to stop large files = building up >> and still use the file channel=3F >> Regards, >> Guy Needham | Data Discovery >> Virgin Media | Enterprise Data, Design & Management >> Bartley Wood Business Park, Hook, Hampshire RG27 9UP >> D 01256 75 3362 >> I welcome VSRE emails. Learn more at *http://vsre.info/* >> >> >> >> >> >> -------------------------------------------------------------------- >> Save Paper - Do you really need to print this e-mail=3F >> >> Visit www.virginmedia.com for more information, and more fun. >> >> This email and any attachments are or may be confidential and legally >> privileged >> and are sent solely for the attention of the addressee(s). If you have >> received this >> email in error, please delete it from your system: its use, disclosure = or >> copying is >> unauthorised. Statements and opinions expressed in this email may not >> represent >> those of Virgin Media. Any representations or commitments in this email = are >> subject to contract. >> >> Registered office: Media House, Bartley Wood Business Park, Hook, >> Hampshire, RG27 9UP >> Registered in England and Wales with number 2591237 >> ------Nodemailer-0.5.0-?=_1-1415383152133 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable Flume will leave at least 2 files per= data directory. Once you have enough events to cause 2 files to be created= , there will be at least 2 per dir. You can use maxFileSize parameter to = control the size of these files.

Thanks, Hari


On Fri, Nov 7, 2014 at 10:25 AM= , Jeff Lord <jlord@cloudera.com> = wrote:

Guy,

What version of flume is this=3F

-Jeff

On Fri, Nov 7, 2014 at 1:19 AM, = Needham, Guy <Guy.Needham@virginmedia.co.uk> = wrote:
Hi all,
=C2=A0
I have a configuration with a file channel configured such that: =
=C2=A0
a1.channels.ch1.type =3D file
a1.channels.ch1.checkpointDir =3D /hadoop/user/flume/channels/checkpoi= nt
a1.channels.ch1.dataDirs =3D /hadoop/user/flume/channels/data
a1.channels.ch1.capacity =3D 100000
a1.channels.ch1.transactionCapacity =3D 5000
=C2=A0
It's been running since October 28th with no issues, but when I looked= today in /hadoop/user/flume/channels/data I saw that the file channel was = building up large files which had been processed and not deleting = them:
=C2=A0
[rdd@hadoop-kn-p2-m01 flume]$ ls -lh channels/data/
total 1.6G
-rw-r----- 1 rdd rdd 1.5G Oct 28 16:10 log-1
-rw-r----- 1 rdd rdd=C2=A0=C2=A0 47 Oct 28 16:10 log-1.meta
-rw-r----- 1 rdd rdd=C2=A0 72M Oct 31 16:28 log-2
-rw-r----- 1 rdd rdd=C2=A0=C2=A0 47 Oct 31 16:29 log-2.meta
It seems like for each = day that data landed (we're still in testing so data not landing = constantly) a data file has been created but not deleted when reading was = completed.
Is this expected = behaviour=3F Is there a way to stop large files building up and still use = the file channel=3F
Regards,
Guy Needham | Data = Discovery
Virgin Media | Enterprise Data, Design & Management
Bartley Wood Business Park, Hook, Hampshire RG27 9UP
D 01256 75 3362
I welcome VSRE emails. = Learn more at http://vsre.info/
=C2=A0
=C2=A0


--------------------------------------------------------------------
Save Paper - Do you really need to print this e-mail=3F

Visit www.virginmedia.com= for more information, and more fun.

This email and any attachments are or may be confidential and legally = privileged
and are sent solely for the attention of the addressee(s). If you have = received this
email in error, please delete it from your system: its use, disclosure or = copying is
unauthorised. Statements and opinions expressed in this email may not = represent
those of Virgin Media. Any representations or commitments in this email = are
subject to contract.

Registered office: Media House, Bartley Wood Business Park, Hook, = Hampshire, RG27 9UP
Registered in England and Wales with number 2591237



------Nodemailer-0.5.0-?=_1-1415383152133--