flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anat Rozenzon <a...@viber.com>
Subject Fwd: 4 times disk consumption?
Date Tue, 10 Sep 2013 08:46:35 GMT
I tried opening the sink now but it seems that it can't take events from
the channel (as it reached the minimumRequiredSpace), see below the error
mesage.
Any way I can continue?

10 Sep 2013 04:27:23,282 WARN
[SinkRunner-PollingRunner-LoadBalancingSinkProcessor]
(org.apache.flume.sink.LoadBalancingSinkProcessor.process:158)  - Sink
failed to consume event. Attempting next sink if available.
java.lang.IllegalStateException: Channel closed [channel=fileChannel]. Due
to java.io.IOException: Usable space exhaused, only 402567168 bytes
remaining, required 524288000 bytes
        at
org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:352)
        at
org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:122)
        at
org.apache.flume.sink.AbstractRpcSink.process(AbstractRpcSink.java:333)
        at
org.apache.flume.sink.LoadBalancingSinkProcessor.process(LoadBalancingSinkProcessor.java:154)
        at
org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Usable space exhaused, only 402567168 bytes
remaining, required 524288000 bytes
        at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:985)
        at org.apache.flume.channel.file.Log.replay(Log.java:472)
        at
org.apache.flume.channel.file.FileChannel.start(FileChannel.java:302)
        at
org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
Source)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown
Source)
        at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown
Source)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
        ... 1 more


---------- Forwarded message ----------
From: Anat Rozenzon <anat@viber.com>
Date: Tue, Sep 10, 2013 at 8:43 AM
Subject: Re: 4 times disk consumption?
To: user@flume.apache.org


Thanks Brock!

I see a parameter called maxFileSize on the file channel:
maxFileSize 2146435071 Max size (in bytes) of a single log file
Is that what you mean?

However I have 3 log files (and probably could have more if it didn't reach
the minimumRequiredSpace), together they use more than the default 2G of
this parameter.


On Tue, Sep 10, 2013 at 8:23 AM, Brock Noland <brock@cloudera.com> wrote:

> If you are concerned about disk space consumption you should lower the
> max log size on the file channel. The exact parameter is in the docs.
>
> On Tue, Sep 10, 2013 at 12:17 AM, Anat Rozenzon <anat@viber.com> wrote:
> > After leaving flume to run in this state (sink is not sending the
> events),
> > the disk space has now grown to 3.4G!
> > I see the same files COMPLETED as yesterday so no new events were read
> into
> > the channel, yet the channel keeps growing!
> >
> > I see this file structure under the file channel work folder:
> >
> > [root@HTS4 old_logs]# du -sh flume/filechannel/data/*
> > 0       flume/filechannel/data/in_use.lock
> > 1.6G    flume/filechannel/data/log-1
> > 4.0K    flume/filechannel/data/log-1.meta
> > 1.6G    flume/filechannel/data/log-2
> > 4.0K    flume/filechannel/data/log-2.meta
> > 338M    flume/filechannel/data/log-3
> > 4.0K    flume/filechannel/data/log-3.meta
> >
> > Any way to avoid this behavior?
> >
> > ---------- Forwarded message ----------
> > From: Anat Rozenzon <anat@viber.com>
> > Date: Mon, Sep 9, 2013 at 3:00 PM
> > Subject: 4 times disk consumption?
> > To: user@flume.apache.org
> >
> >
> > Hi,
> >
> > I have a directory spooler connected to a file channel, currently with a
> > non-working sink.
> > Channel capacity is 200M (events?!), since the sink is not working, the
> > channel gets filled.
> >
> > However, I see that although the original files total size is 150M, the
> full
> > file channel isusing almost 4 times that disk space (i.e. 550M).
> >
> > Any idea why? is this the expected ratio between original size and file
> > channel disk usage?
> >
> > Thanks
> > Anat
> >
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
>

Mime
View raw message