Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2CE66DA3E for ; Mon, 27 Aug 2012 01:29:34 +0000 (UTC) Received: (qmail 56829 invoked by uid 500); 27 Aug 2012 01:29:34 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 56725 invoked by uid 500); 27 Aug 2012 01:29:33 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 56714 invoked by uid 99); 27 Aug 2012 01:29:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Aug 2012 01:29:33 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dennyy99@gmail.com designates 74.125.82.179 as permitted sender) Received: from [74.125.82.179] (HELO mail-we0-f179.google.com) (74.125.82.179) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Aug 2012 01:29:25 +0000 Received: by weyx10 with SMTP id x10so1886927wey.38 for ; Sun, 26 Aug 2012 18:29:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=UInVKNvEC8xWbd3Ao3t/o3zxIR735taGOt2ICbkPCq0=; b=vTq+x9Pt4kQVmj6syDQXuj803mFZNffiIzRFXEBB8o1Mx2m18vNaa320Ww130ba8L5 H2546TwbTYRpR1Y6hYg5GUOkxnZyp2HqToH1oH0zM77wxQHNL69sZZQVWNCv90GfynHo PnZsWJZYQz4OLXaZMFYPKqaVN11MUyfC1AOKpTOCuYmJyVKrHEW3MUKhALvyTxvtoo81 xWH+1hw6wUMy9hK6nZIaIAfsPFASxpdYNODzl46BTRVVn4dEWp73COAre+HKJyWiGnCJ pIqADyYPB76dXCu18jNwCYp5Wy5DkSe+az3yb0NejuMr2WGXnYYaicXyr77mVwCb8FIQ 6FKg== MIME-Version: 1.0 Received: by 10.216.74.21 with SMTP id w21mr6161061wed.77.1346030945313; Sun, 26 Aug 2012 18:29:05 -0700 (PDT) Received: by 10.227.196.195 with HTTP; Sun, 26 Aug 2012 18:29:05 -0700 (PDT) In-Reply-To: References: Date: Mon, 27 Aug 2012 09:29:05 +0800 Message-ID: Subject: Re: Flume hdfs sink rollover From: Denny Ye To: user@flume.apache.org Content-Type: multipart/alternative; boundary=001636e0a6a55a24eb04c835405e --001636e0a6a55a24eb04c835405e Content-Type: text/plain; charset=ISO-8859-1 Yes, you are right. Flume uses uncompressed size to judge the case of rolling. The appropriate place to calculate size is in-memory. Normally, compression ratio of snappy might be 5x-10x, more better if there have too many duplicated data. Thus, it fits your setting, do you agree? -Regards Denny Ye 2012/8/26 Mohit Anchlia > > > On Sun, Aug 26, 2012 at 6:47 AM, Denny Ye wrote: > >> hi Mohit, >> Why you confirm it doesn't work at time? I think it reaches to size >> limitation of your setting 'hdfs.rollSize'. Each snappy file almost 5 >> hundreds megabytes every 6 or 7 minutes. It fits the compression radio of >> snappy format >> I rearraged your file order. It's well from my point. >> >> > Size I gave is 5G in my conf but it rolls over at 400M. Does it mean that > flume uses uncompressed size to determine when to rollover? Is it all > calcluated in memory as it writes to the sink and before it compresses? > > > >> -rwxr-xr-x 3 root root 170657363 2012-08-24 14:58 >> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674873.snappy >> -rwxr-xr-x 3 root root 407700267 2012-08-24 13:57 >> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674872.snappy >> -rwxr-xr-x 3 root root 407678663 2012-08-24 13:50 >> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674871.snappy >> -rwxr-xr-x 3 root root 407742601 2012-08-24 13:44 >> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674870.snappy >> -rwxr-xr-x 3 root root 28118740 2012-08-24 13:35 >> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840475805.snappy.tmp >> >> -rwxr-xr-x 3 root root 159909773 2012-08-24 15:04 >> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668740.snappy >> -rwxr-xr-x 3 root root 407739053 2012-08-24 13:57 >> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668739.snappy >> -rwxr-xr-x 3 root root 407786389 2012-08-24 13:50 >> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668738.snappy >> -rwxr-xr-x 3 root root 407757832 2012-08-24 13:44 >> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668737.snappy >> -rwxr-xr-x 3 root root 51085873 2012-08-24 13:36 >> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840465501.snappy >> >> -Regards >> Denny Ye >> >> 2012/8/25 Mohit Anchlia >> >>> I have rollover defined either to roll every 5G or 1+ hr but doesn't >>> seem to be working. Could you please suggest if I got the conf incorrectly >>> configured? >>> >>> foo.sinks.hdfsSink.hdfs.filePrefix = web >>> foo.sinks.hdfsSink.hdfs.rollInterval = 4000 >>> foo.sinks.hdfsSink.hdfs.rollCount = 0 >>> foo.sinks.hdfsSink.hdfs.rollSize = 5000000000 >>> foo.sinks.hdfsSink.hdfs.fileType = SequenceFile >>> foo.sinks.hdfsSink.hdfs.codeC = snappy >>> >>> >>> >>> drwxr-xr-x - root root 5 2012-08-24 14:58 >>> /flume_vol/flume/2012/08/24/13/dslg1 >>> -rwxr-xr-x 3 root root 28118740 2012-08-24 13:35 >>> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840475805.snappy.tmp >>> -rwxr-xr-x 3 root root 407700267 2012-08-24 13:57 >>> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674872.snappy >>> -rwxr-xr-x 3 root root 407742601 2012-08-24 13:44 >>> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674870.snappy >>> -rwxr-xr-x 3 root root 170657363 2012-08-24 14:58 >>> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674873.snappy >>> -rwxr-xr-x 3 root root 407678663 2012-08-24 13:50 >>> /flume_vol/flume/2012/08/24/13/dslg1/web.1345840674871.snappy >>> drwxr-xr-x - root root 5 2012-08-24 15:04 >>> /flume_vol/flume/2012/08/24/13/dslg2 >>> -rwxr-xr-x 3 root root 407786389 2012-08-24 13:50 >>> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668738.snappy >>> -rwxr-xr-x 3 root root 407757832 2012-08-24 13:44 >>> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668737.snappy >>> -rwxr-xr-x 3 root root 159909773 2012-08-24 15:04 >>> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668740.snappy >>> -rwxr-xr-x 3 root root 51085873 2012-08-24 13:36 >>> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840465501.snappy >>> -rwxr-xr-x 3 root root 407739053 2012-08-24 13:57 >>> /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668739.snappy >>> >> >> > --001636e0a6a55a24eb04c835405e Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Yes, you are right. Flume uses uncompressed size to judge the case of rolli= ng. The appropriate place to calculate size is in-memory. Normally, compres= sion ratio of snappy might be 5x-10x, more better if there have too many du= plicated data. Thus, it fits your setting, do you agree?

-Regards
Denny Ye

2012/8/26 Mohit Anchlia <mohitanchlia@gmail.com>
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex">

On Sun, Aug 26, 2012 at 6:47 A= M, Denny Ye <dennyy99@gmail.com> wrote:
hi Mohit,=20
=A0 =A0 =A0Why you confirm it doesn't work at time? =A0I think it = reaches to size limitation of your setting 'hdfs.rollSize'. Each sn= appy file almost 5 hundreds megabytes every 6 or 7 minutes. It fits the com= pression radio of snappy format
=A0 =A0 =A0I rearraged your file order. It's well from my point.= =A0

=A0
Size I gave is 5G in my conf but it rolls over at 400M. Does it = mean that flume uses uncompressed size to determine when to rollover? Is it= all calcluated in memory as it writes to the sink and before it compresses= ?
=A0
=A0
-rwxr-xr-x =A0 3 root root =A0170657363 2012-08-24 14:58 /flume_vol/fl= ume/2012/08/24/13/dslg1/web.1345840674873.snappy
-rwxr-xr-x =A0 3 root root =A0407700267 2012-08-24 13:57 /flume_vol/fl= ume/2012/08/24/13/dslg1/web.1345840674872.snappy
-rwxr-xr-x =A0 3 root root =A0407678663 2012-08-24 13:50 /flume_vol/fl= ume/2012/08/24/13/dslg1/web.1345840674871.snappy
-rwxr-xr-x =A0 3 root root =A0407742601 2012-08-24 13:44 /flume_vol/fl= ume/2012/08/24/13/dslg1/web.1345840674870.snappy
-rwxr-xr-x =A0 3 root root =A0 28118740 2012-08-24 13:35 /flume_vol/fl= ume/2012/08/24/13/dslg1/web.1345840475805.snappy.tmp

-rwxr-xr-x =A0 3 root root =A0159909773 2012-08-24 15:04 /flume_vol/fl= ume/2012/08/24/13/dslg2/web.1345840668740.snappy
-rwxr-xr-x =A0 3 root root =A0407739053 2012-08-24 13:57 /flume_vol/fl= ume/2012/08/24/13/dslg2/web.1345840668739.snappy
-rwxr-xr-x =A0 3 root root =A0407786389 2012-08-24 13:50 /flume_vol/fl= ume/2012/08/24/13/dslg2/web.1345840668738.snappy
-rwxr-xr-x =A0 3 root root =A0407757832 2012-08-24 13:44 /flume_vol/fl= ume/2012/08/24/13/dslg2/web.1345840668737.snappy
-rwxr-xr-x =A0 3 root root =A0 51085873 2012-08-24 13:36 /flume_vol/fl= ume/2012/08/24/13/dslg2/web.1345840465501.snappy

-Regards
Denny Ye

2012/8/25 Mohit Anchlia <mohitanchlia@gmai= l.com>
I have rollover defined either to roll every 5G or 1+ hr but doesn'= ;t seem to=A0 be working. Could you please suggest if I got the conf incorr= ectly configured?
=A0
foo.sinks.hdfsSink.hdfs.filePrefix =3D web
foo.sinks.hdfsSink.hdfs.= rollInterval=A0 =3D 4000
foo.sinks.hdfsSink.hdfs.rollCount=A0 =3D 0
f= oo.sinks.hdfsSink.hdfs.rollSize=A0 =3D 5000000000
foo.sinks.hdfsSink.hdf= s.fileType=A0 =3D SequenceFile
foo.sinks.hdfsSink.hdfs.codeC=A0 =3D snappy
=A0
=A0
=A0
drwxr-xr-x=A0=A0 - root root=A0=A0=A0=A0=A0=A0=A0=A0=A0 5 2012-08-24 1= 4:58 /flume_vol/flume/2012/08/24/13/dslg1
-rwxr-xr-x=A0=A0 3 root root= =A0=A0 28118740 2012-08-24 13:35 /flume_vol/flume/2012/08/24/13/dslg1/web.1= 345840475805.snappy.tmp
-rwxr-xr-x=A0=A0 3 root root=A0 407700267 2012-08-24 13:57 /flume_vol/flume= /2012/08/24/13/dslg1/web.1345840674872.snappy
-rwxr-xr-x=A0=A0 3 root ro= ot=A0 407742601 2012-08-24 13:44 /flume_vol/flume/2012/08/24/13/dslg1/web.1= 345840674870.snappy
-rwxr-xr-x=A0=A0 3 root root=A0 170657363 2012-08-24 14:58 /flume_vol/flume= /2012/08/24/13/dslg1/web.1345840674873.snappy
-rwxr-xr-x=A0=A0 3 root ro= ot=A0 407678663 2012-08-24 13:50 /flume_vol/flume/2012/08/24/13/dslg1/web.1= 345840674871.snappy
drwxr-xr-x=A0=A0 - root root=A0=A0=A0=A0=A0=A0=A0=A0=A0 5 2012-08-24 15:04 = /flume_vol/flume/2012/08/24/13/dslg2
-rwxr-xr-x=A0=A0 3 root root=A0 407= 786389 2012-08-24 13:50 /flume_vol/flume/2012/08/24/13/dslg2/web.1345840668= 738.snappy
-rwxr-xr-x=A0=A0 3 root root=A0 407757832 2012-08-24 13:44 /f= lume_vol/flume/2012/08/24/13/dslg2/web.1345840668737.snappy
-rwxr-xr-x=A0=A0 3 root root=A0 159909773 2012-08-24 15:04 /flume_vol/flume= /2012/08/24/13/dslg2/web.1345840668740.snappy
-rwxr-xr-x=A0=A0 3 root ro= ot=A0=A0 51085873 2012-08-24 13:36 /flume_vol/flume/2012/08/24/13/dslg2/web= .1345840465501.snappy
-rwxr-xr-x=A0=A0 3 root root=A0 407739053 2012-08-24 13:57 /flume_vol/flume= /2012/08/24/13/dslg2/web.1345840668739.snappy
<= br>


--001636e0a6a55a24eb04c835405e--