flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: File Channel Backup Checkpoints are I/O Intensive
Date Mon, 09 Jun 2014 23:01:36 GMT
Hi Abraham,  

Compressing the backup checkpoint is very possible. Since the backup is rarely read (only
if the original one is corrupt on restarts), is it used. So I think compressing it using something
like Snappy would make sense (GZIP might hit performance). Can you try using snappy-java and
see if that gives good perf and reasonable compression?

Patches are always welcome. I’d be glad to review and commit it. I would suggest making
the compression optional via configuration so that anyone with smaller channels don’t end
up using CPU for not much gain.  


On Monday, June 9, 2014 at 3:56 PM, Abraham Fine wrote:

> Hello-
> We are using Flume 1.4 with File Channel configured to use a very
> large capacity. We keep the checkpoint and backup checkpoint on
> separate disks.
> Normally the file channel is mostly empty (<<1% of capacity). For the
> checkpoint the disk I/O seems to be very reasonable due to the usage
> of a MappedByteBuffer.
> On the other hand, the backup checkpoint seems to be written to disk
> in its entirety over and over again, resulting in very high disk
> utilization.
> I noticed that, because the checkpoint file is mostly empty, it is
> very compressible. I was able to GZIP our checkpoint from 381M to
> 386K. I was wondering if it would be possible to always compress the
> backup checkpoint before writing it to disk.
> I would be happy to work on a patch to implement this functionality if
> there is interest.
> Thanks in Advance,
> --  
> Abraham Fine | Software Engineer
> (516) 567-2535
> BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com (http://www.brightroll.com)

View raw message