Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8DC8717345 for ; Mon, 10 Nov 2014 09:16:28 +0000 (UTC) Received: (qmail 45433 invoked by uid 500); 10 Nov 2014 09:16:28 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 45378 invoked by uid 500); 10 Nov 2014 09:16:28 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 45368 invoked by uid 99); 10 Nov 2014 09:16:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Nov 2014 09:16:28 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of hshreedharan@cloudera.com designates 209.85.216.46 as permitted sender) Received: from [209.85.216.46] (HELO mail-qa0-f46.google.com) (209.85.216.46) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Nov 2014 09:16:24 +0000 Received: by mail-qa0-f46.google.com with SMTP id n8so4947299qaq.19 for ; Mon, 10 Nov 2014 01:14:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:mime-version:message-id:in-reply-to :references:from:to:cc:subject:content-type; bh=MfcScsK1b59tlKy5y4AwaNVqTbAGDHMJXewXuZOOi3k=; b=XIzjenRe9e0GkOlz3rjfnLzyvJ9rv0IoCtmrRXBUYuQEMzWYmEeh+kA5EppqJ+wDO6 82cyXzoKp+hg4Xt0TlBuGe4TYl7/7eTWJclnuwUaZsPL8OINFJy/Om4Ww5BlceZqVltj kl8+z/P0xVtRmlBY24QABvnDA4vUoE072DpBhUGn87OZu1gWp8rJiB4jaSFOUMxJxQYX dcyZ/PVz3ozi/AYuxtCQxiYp2yiAH/bKMUToyjQD94KVHN6gm05Q0D4lwf4wxvAbnZuF wh2jtmPKgxflhoVc86nMNYxXacF1lJo2FibJyE+wYZo6IDciG+OO/FCtfpx8nftz/g5C 0gAA== X-Gm-Message-State: ALoCoQkqRxZpYVxGBCe8XQJP8Njd0BbuBsrtPSjMFZoD1rPbuv0QAPrvGq7tQQuHCY45SBcrB14u X-Received: by 10.224.80.6 with SMTP id r6mr41102870qak.5.1415610873228; Mon, 10 Nov 2014 01:14:33 -0800 (PST) Received: from hedwig-6.prd.orcali.com (ec2-54-85-253-252.compute-1.amazonaws.com. [54.85.253.252]) by mx.google.com with ESMTPSA id l52sm15013703qgf.43.2014.11.10.01.14.32 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 10 Nov 2014 01:14:32 -0800 (PST) Date: Mon, 10 Nov 2014 01:14:32 -0800 (PST) X-Google-Original-Date: Mon, 10 Nov 2014 09:14:32 GMT MIME-Version: 1.0 X-Mailer: Nodemailer (0.5.0; +http://www.nodemailer.com/) Message-Id: <1415610872247.c47f27a8@Nodemailer> In-Reply-To: <3C49C1AEF3580C4C92764F36260B009A3544B9EF@WB2-MBX-P0002.systems.private> References: <3C49C1AEF3580C4C92764F36260B009A3544B9EF@WB2-MBX-P0002.systems.private> X-Orchestra-Oid: 61B9BAEE-63CE-483B-BFD3-AFBECC708558 X-Orchestra-Sig: 03c53c38dc4f917149142f0e0d7c6549694f4fca X-Orchestra-Thrid: TEB56D109-7104-46FE-9747-57A7D6EE5D69_1484104274765429352 X-Orchestra-Thrid-Sig: 2b5bd4b3b0d89d0aa4653605da564216744c2e34 X-Orchestra-Account: 49a9593b926777ecc0142dd20c9166ec285057ce From: "Hari Shreedharan" To: user@flume.apache.org Cc: user@flume.apache.org Subject: RE: File channels creating many large files Content-Type: multipart/alternative; boundary="----Nodemailer-0.5.0-?=_1-1415610872633" X-Virus-Checked: Checked by ClamAV on apache.org ------Nodemailer-0.5.0-?=_1-1415610872633 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable That value is in bytes. At 500k, you will likely end up with too many files= . You should set it as high as you can. Thanks, Hari On Mon, Nov 10, 2014 at 1:05 AM, Needham, Guy wrote: > Hari, Jeff, > thanks for your replies. It's Flume 1.5.0, I'll use the maxFileSize = parameter to fix this. Is there any impact on channel optimisation from = setting it to say 500000=3F > Regards, > Guy Needham | Data Discovery > Virgin Media | Enterprise Data, Design & Management > Bartley Wood Business Park, Hook, Hampshire RG27 9UP > D 01256 75 3362 > I welcome VSRE emails. Learn more at http://vsre.info/ > =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F=5F=5F=5F=5F=5F=5F > From: Hari Shreedharan [mailto:hshreedharan@cloudera.com] > Sent: 07 November 2014 17:59 > To: user@flume.apache.org > Cc: user@flume.apache.org > Subject: Re: File channels creating many large files > Flume will leave at least 2 files per data directory. Once you have = enough events to cause 2 files to be created, there will be at least 2 per = dir. You can use maxFileSize parameter to control the size of these files. > Thanks, Hari > On Fri, Nov 7, 2014 at 10:25 AM, Jeff Lord > wrote: > Guy, > What version of flume is this=3F > -Jeff > On Fri, Nov 7, 2014 at 1:19 AM, Needham, Guy > wrote: > Hi all, > I have a configuration with a file channel configured such that: > a1.channels.ch1.type =3D file > a1.channels.ch1.checkpointDir =3D /hadoop/user/flume/channels/checkpoint > a1.channels.ch1.dataDirs =3D /hadoop/user/flume/channels/data > a1.channels.ch1.capacity =3D 100000 > a1.channels.ch1.transactionCapacity =3D 5000 > It's been running since October 28th with no issues, but when I looked = today in /hadoop/user/flume/channels/data I saw that the file channel was = building up large files which had been processed and not deleting them: > [rdd@hadoop-kn-p2-m01 flume]$ ls -lh channels/data/ > total 1.6G > -rw-r----- 1 rdd rdd 1.5G Oct 28 16:10 log-1 > -rw-r----- 1 rdd rdd 47 Oct 28 16:10 log-1.meta > -rw-r----- 1 rdd rdd 72M Oct 31 16:28 log-2 > -rw-r----- 1 rdd rdd 47 Oct 31 16:29 log-2.meta > It seems like for each day that data landed (we're still in testing so = data not landing constantly) a data file has been created but not deleted = when reading was completed. > Is this expected behaviour=3F Is there a way to stop large files building= up and still use the file channel=3F > Regards, > Guy Needham | Data Discovery > Virgin Media | Enterprise Data, Design & Management > Bartley Wood Business Park, Hook, Hampshire RG27 9UP > D 01256 75 3362 > I welcome VSRE emails. Learn more at http://vsre.info/ > -------------------------------------------------------------------- > Save Paper - Do you really need to print this e-mail=3F > Visit www.virginmedia.com for more = information, and more fun. > This email and any attachments are or may be confidential and legally = privileged > and are sent solely for the attention of the addressee(s). If you have = received this > email in error, please delete it from your system: its use, disclosure or= copying is > unauthorised. Statements and opinions expressed in this email may not = represent > those of Virgin Media. Any representations or commitments in this email = are > subject to contract. > Registered office: Media House, Bartley Wood Business Park, Hook, = Hampshire, RG27 9UP > Registered in England and Wales with number 2591237 > -------------------------------------------------------------------- > Save Paper - Do you really need to print this e-mail=3F > Visit www.virginmedia.com for more information, and more fun. > This email and any attachments are or may be confidential and legally = privileged > and are sent solely for the attention of the addressee(s). If you have = received this > email in error, please delete it from your system: its use, disclosure or= copying is > unauthorised. Statements and opinions expressed in this email may not = represent > those of Virgin Media. Any representations or commitments in this email = are > subject to contract.=20 > Registered office: Media House, Bartley Wood Business Park, Hook, = Hampshire, RG27 9UP > Registered in England and Wales with number = 2591237 ------Nodemailer-0.5.0-?=_1-1415610872633 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable That value is in bytes. At 500k, you = will likely end up with too many files. You should set it as high as you = can.

Thanks, Hari


On Mon, Nov 10, 2014 at 1:05 AM= , Needham, Guy <Guy.= Needham@virginmedia.co.uk> wrote:

Hari, Jeff,=
=C2=A0
thanks for your replies. It's = Flume 1.5.0, I'll use the maxFileSize parameter to fix this. Is there any = impact on channel optimisation from setting it to say = 500000=3F
=C2=A0

Regards, =
Guy Needham | Data Discovery
Virgin Media | Enterprise Data, Design & Management
Bartley Wood Business Park, Hook, Hampshire RG27 9UP
D 01256 75 3362

I welcome VSRE emails. Learn more at http://vsre.info/

=C2=A0


From: Hari Shreedharan = [mailto:hshreedharan@cloudera.com]
Sent: 07 November 2014 17:59
To: user@flume.apache.= org
Cc: user@flume.apache.org
Subject: Re: File = channels creating many large files

Flume will leave at least 2 files per= data directory. Once you have enough events to cause 2 files to be created= , there will be at least 2 per dir. You can use maxFileSize parameter to = control the size of these files.

Thanks, Hari


On Fri, Nov 7, 2014 at 10:25 AM, Jeff Lord <jlord@cloudera.com> = wrote:

Guy,

What version of flume is this=3F

-Jeff

On Fri, Nov 7, 2014 at 1:19 AM, = Needham, Guy <Guy.= Needham@virginmedia.co.uk> wrote:
Hi all,
=C2=A0
I have a configuration with a file channel configured such that: =
=C2=A0
a1.channels.ch1.type =3D file
a1.channels.ch1.checkpointDir =3D /hadoop/user/flume/channels/checkpoi= nt
a1.channels.ch1.dataDirs =3D /hadoop/user/flume/channels/data
a1.channels.ch1.capacity =3D 100000
a1.channels.ch1.transactionCapacity =3D 5000
=C2=A0
It's been running since October 28th with no issues, but when I looked= today in /hadoop/user/flume/channels/data I saw that the file channel was = building up large files which had been processed and not deleting = them:
=C2=A0
[rdd@hadoop-kn-p2-m01 flume]$ ls -lh channels/data/
total 1.6G
-rw-r----- 1 rdd rdd 1.5G Oct 28 16:10 log-1
-rw-r----- 1 rdd rdd=C2=A0=C2=A0 47 Oct 28 16:10 log-1.meta
-rw-r----- 1 rdd rdd=C2=A0 72M Oct 31 16:28 log-2
-rw-r----- 1 rdd rdd=C2=A0=C2=A0 47 Oct 31 16:29 log-2.meta
It seems like for = each day that data landed (we're still in testing so data not landing = constantly) a data file has been created but not deleted when reading was = completed.
Is this expected = behaviour=3F Is there a way to stop large files building up and still use = the file channel=3F
Regards,
Guy Needham | Data = Discovery
Virgin Media | Enterprise Data, Design & Management
Bartley Wood Business Park, Hook, Hampshire RG27 9UP
D 01256 75 3362
I welcome VSRE = emails. Learn more at http://vsre.= info/
=C2=A0
=C2=A0


--------------------------------------------------------------------
Save Paper - Do you really need to print this e-mail=3F

Visit www.virginmedia.com= for more information, and more fun.

This email and any attachments are or may be confidential and legally = privileged
and are sent solely for the attention of the addressee(s). If you have = received this
email in error, please delete it from your system: its use, disclosure or = copying is
unauthorised. Statements and opinions expressed in this email may not = represent
those of Virgin Media. Any representations or commitments in this email = are
subject to contract.

Registered office: Media House, Bartley Wood Business Park, Hook, = Hampshire, RG27 9UP
Registered in England and Wales with number 2591237




--------------------------------------------------------------------
Save Paper - Do you really need to print this e-mail=3F

Visit www.virginmedia.com for more information, and more fun.

This email and any attachments are or may be confidential and legally = privileged
and are sent solely for the attention of the addressee(s). If you have = received this
email in error, please delete it from your system: its use, disclosure or = copying is
unauthorised. Statements and opinions expressed in this email may not = represent
those of Virgin Media. Any representations or commitments in this email = are
subject to contract.

Registered office: Media House, Bartley Wood Business Park, Hook, = Hampshire, RG27 9UP
Registered in England and Wales with number 2591237


------Nodemailer-0.5.0-?=_1-1415610872633--