flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Malouf <malouf.g...@gmail.com>
Subject Re: Enabling file channel backup checkpoint causes significant disk IO at start-up
Date Mon, 08 Sep 2014 20:59:59 GMT
Hi Hari,

I'm a colleague of Michael's, if we are in need of a few of these patches,
would you recommend we do our own custom build?

Separate from Apache's release cycle, would these patches get included in
the next CDH build that includes Flume?  (Not sure what the schedule of
that is...)



On Mon, Sep 8, 2014 at 4:55 PM, Hari Shreedharan <hshreedharan@cloudera.com>

> Flume releases are once every few months - since we just had one a couple
> of months back, I don't think there will be one happening right away.
> Michael Diamant wrote:
> Hari, thank you for your quick reply.  A follow-up question to help me
> figure out how best to proceed on my end:  Can you provide an estimate
> as to when the next Flume release will occur?
> On Mon, Sep 8, 2014 at 4:07 PM, Hari Shreedharan
> <hshreedharan@cloudera.com <mailto:hshreedharan@cloudera.com>> wrote:
>     This patch should address the issue, if enabled:
> https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commitdiff;h=69fd6b3ad5e5b9ae6f1293b3d8e57ed57fd6701c;hp=f15f20785262ac3cb3e35c2a12e669b7a836d35f
>     It will be part of the next Flume release (or CDH5.2.0).
>     --
>     Thanks,
>     Hari
>     Michael Diamant <mailto:diamant.michael@gmail.com>
>     September 8, 2014 at 12:58 PM
>     My team uses Flume 1.4.0 packaged with CDH5.0.2 via an embedded
>     agent to write to a file channel.  From a previous thread started
>     by my colleague, "FileChannel Replays consistently take a long
>     time" and associated issue,
>     https://issues.apache.org/jira/browse/FLUME-2450, it was
>     suggested to use a backup checkpoint directory to avoid lengthy
>     replays.  When I enabled the backup checkpoint directory, I
>     observed via iotop near 100% IO by my application with the
>     embedded agent.  This level of IO persists for about 30 seconds
>     rendering the application unusable during this time period.
>     For comparison, I monitored via iotop when backup checkpoint is
>     disabled.  IO activity occurs for at most several seconds.  That
>     is, there is a qualitative difference when enabling the backup
>     checkpoint directory.  Additionally, I also tried deleting the
>     existing checkpoints/data directories to start with a clean
>     slate.  Those experiment results are in-line with my above
>     observations.
>     Is this expected behavior when using a backup checkpoint
>     directory?  Is there anyway in which the amount of IO can be
>     reduced?  I appreciate feedback and insights because the current
>     behavior is untenable for a production environment.
>     Thank you,
>     Michael

View raw message