flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: File Channel Backup Checkpoints are I/O Intensive
Date Mon, 09 Jun 2014 23:01:36 GMT
Hi Abraham,  

Compressing the backup checkpoint is very possible. Since the backup is rarely read (only
if the original one is corrupt on restarts), is it used. So I think compressing it using something
like Snappy would make sense (GZIP might hit performance). Can you try using snappy-java and
see if that gives good perf and reasonable compression?

Patches are always welcome. I’d be glad to review and commit it. I would suggest making
the compression optional via configuration so that anyone with smaller channels don’t end
up using CPU for not much gain.  


Thanks,
Hari


On Monday, June 9, 2014 at 3:56 PM, Abraham Fine wrote:

> Hello-
>  
> We are using Flume 1.4 with File Channel configured to use a very
> large capacity. We keep the checkpoint and backup checkpoint on
> separate disks.
>  
> Normally the file channel is mostly empty (<<1% of capacity). For the
> checkpoint the disk I/O seems to be very reasonable due to the usage
> of a MappedByteBuffer.
>  
> On the other hand, the backup checkpoint seems to be written to disk
> in its entirety over and over again, resulting in very high disk
> utilization.
>  
> I noticed that, because the checkpoint file is mostly empty, it is
> very compressible. I was able to GZIP our checkpoint from 381M to
> 386K. I was wondering if it would be possible to always compress the
> backup checkpoint before writing it to disk.
>  
> I would be happy to work on a patch to implement this functionality if
> there is interest.
>  
> Thanks in Advance,
>  
> --  
> Abraham Fine | Software Engineer
> (516) 567-2535
> BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com (http://www.brightroll.com)
>  
>  



Mime
View raw message