flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: File Channel Backup Checkpoints are I/O Intensive
Date Thu, 12 Jun 2014 00:39:39 GMT
Thanks. I will review it :)  


Thanks,
Hari


On Wednesday, June 11, 2014 at 5:00 PM, Abraham Fine wrote:

> I went ahead and created a JIRA and patch:
> https://issues.apache.org/jira/browse/FLUME-2401
>  
> The option is configurable with:
> agentX.channels.ch1.compressBackupCheckpoint = true
>  
> As per your recommendation, I used snappy-java. I also considered the
> snappy and lz4 implementations in Hadoop IO but noticed that the
> Hadoop IO dependency was removed in
> https://issues.apache.org/jira/browse/FLUME-1285
>  
> Thanks,
> Abe
> --  
> Abraham Fine | Software Engineer
> (516) 567-2535
> BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com (http://www.brightroll.com)
>  
>  
> On Mon, Jun 9, 2014 at 4:01 PM, Hari Shreedharan
> <hshreedharan@cloudera.com (mailto:hshreedharan@cloudera.com)> wrote:
> > Hi Abraham,
> >  
> > Compressing the backup checkpoint is very possible. Since the backup is
> > rarely read (only if the original one is corrupt on restarts), is it used.
> > So I think compressing it using something like Snappy would make sense (GZIP
> > might hit performance). Can you try using snappy-java and see if that gives
> > good perf and reasonable compression?
> >  
> > Patches are always welcome. I’d be glad to review and commit it. I would
> > suggest making the compression optional via configuration so that anyone
> > with smaller channels don’t end up using CPU for not much gain.
> >  
> >  
> > Thanks,
> > Hari
> >  
> > On Monday, June 9, 2014 at 3:56 PM, Abraham Fine wrote:
> >  
> > Hello-
> >  
> > We are using Flume 1.4 with File Channel configured to use a very
> > large capacity. We keep the checkpoint and backup checkpoint on
> > separate disks.
> >  
> > Normally the file channel is mostly empty (<<1% of capacity). For the
> > checkpoint the disk I/O seems to be very reasonable due to the usage
> > of a MappedByteBuffer.
> >  
> > On the other hand, the backup checkpoint seems to be written to disk
> > in its entirety over and over again, resulting in very high disk
> > utilization.
> >  
> > I noticed that, because the checkpoint file is mostly empty, it is
> > very compressible. I was able to GZIP our checkpoint from 381M to
> > 386K. I was wondering if it would be possible to always compress the
> > backup checkpoint before writing it to disk.
> >  
> > I would be happy to work on a patch to implement this functionality if
> > there is interest.
> >  
> > Thanks in Advance,
> >  
> > --
> > Abraham Fine | Software Engineer
> > (516) 567-2535
> > BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com (http://www.brightroll.com)
> >  
>  
>  
>  



Mime
View raw message