flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhiwen Sun <pens...@gmail.com>
Subject Re: Why used space of flie channel buffer directory increase?
Date Wed, 20 Mar 2013 09:20:53 GMT
Thanks for your reply.

I just wanna confirm whether the space of file channel has a limit.

Zhiwen Sun



On Wed, Mar 20, 2013 at 4:06 PM, Hari Shreedharan <hshreedharan@cloudera.com
> wrote:

> If you reduce the capacity the channel will be able to buffer fewer
> events. If you want to reduce the space used when there are only a few
> events remaining set the config param: "maxFileSize" to something
> lower(this is in bytes). I don't advice setting this to lower than a few
> hundred megabytes (in fact, the default value works pretty well - do you
> really need to save 3GB space?)- else you will end up having a huge number
> of small files if there are many events wait to be taken from the channel.
>
>
> Hari
>
>
> On Wed, Mar 20, 2013 at 12:50 AM, Zhiwen Sun <pensz01@gmail.com> wrote:
>
>> Hi Hari:
>>
>> Is that means I can reduce the capacity of file channel to cut down max
>> disk space used by file channel?
>>
>>
>> Zhiwen Sun
>>
>>
>>
>> On Wed, Mar 20, 2013 at 3:23 PM, Hari Shreedharan <
>> hshreedharan@cloudera.com> wrote:
>>
>>>  Hi,
>>>
>>> Like I mentioned earlier, we will always keep 2 data files in each data
>>> directory (the ".meta" files are metadata associated to the actual data).
>>> Once a log-8 is created(when log-7 gets rotated when it hits maximum size)
>>> and all of the events in log-6 are taken, then log-6 will get deleted, but
>>> you will still will see log-7 and log-8. So what you are seeing is not
>>> unexpected.
>>>
>>>
>>> Hari
>>>
>>> --
>>> Hari Shreedharan
>>>
>>> On Tuesday, March 19, 2013 at 6:30 PM, Zhiwen Sun wrote:
>>>
>>> Thanks all for your reply.
>>>
>>> @Kenison
>>> I stop my tail -F | nc program and there is no new event file in HDFS,
>>> so I think there is no event arrive. To make sure, I will test again with
>>> enable JMX.
>>>
>>> @Alex
>>>
>>> The latest log is following. I can't see any exception or warning.
>>>
>>> 13/03/19 15:28:16 INFO hdfs.BucketWriter: Renaming hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901.tmp to hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490901
>>> 13/03/19 15:28:16 INFO hdfs.BucketWriter: Creating hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp
>>> 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Start checkpoint
>>> for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to
>>> sync = 3
>>> 13/03/19 15:28:17 INFO file.EventQueueBackingStoreFile: Updating
>>> checkpoint metadata: logWriteOrderID: 1363659953997, queueSize: 0,
>>> queueHead: 362981
>>> 13/03/19 15:28:17 INFO file.LogFileV3: Updating log-7.meta
>>> currentPosition = 216278208, logWriteOrderID = 1363659953997
>>> 13/03/19 15:28:17 INFO file.Log: Updated checkpoint for file:
>>> /home/zhiwensun/.flume/file-channel/data/log-7 position: 216278208
>>> logWriteOrderID: 1363659953997
>>> 13/03/19 15:28:26 INFO hdfs.BucketWriter: Renaming hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902.tmp to hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490902
>>> 13/03/19 15:28:27 INFO hdfs.BucketWriter: Creating hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp
>>> 13/03/19 15:28:37 INFO hdfs.BucketWriter: Renaming hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903.tmp to hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490903
>>> 13/03/19 15:28:37 INFO hdfs.BucketWriter: Creating hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp
>>>
>>> 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Start checkpoint
>>> for /home/zhiwensun/.flume/file-channel/checkpoint/checkpoint, elements to
>>> sync = 2
>>> 13/03/19 15:28:47 INFO file.EventQueueBackingStoreFile: Updating
>>> checkpoint metadata: logWriteOrderID: 1363659954200, queueSize: 0,
>>> queueHead: 362981
>>> 13/03/19 15:28:47 INFO file.LogFileV3: Updating log-7.meta
>>> currentPosition = 216288815, logWriteOrderID = 1363659954200
>>> 13/03/19 15:28:47 INFO file.Log: Updated checkpoint for file:
>>> /home/zhiwensun/.flume/file-channel/data/log-7 position: 216288815
>>> logWriteOrderID: 1363659954200
>>> 13/03/19 15:28:48 INFO hdfs.BucketWriter: Renaming hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904.tmp to hdfs://
>>> 127.0.0.1:9000/flume/events/2013-03-19/app.1363660490904
>>>
>>>
>>> @Hari
>>> em, 12 hours passed. The size of file channel directory has no reduce.
>>>
>>> Files in file channel directory:
>>>
>>> -rw-r--r-- 1 zhiwensun zhiwensun    0 2013-03-19 09:15 in_use.lock
>>> -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11 log-6
>>> -rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 10:12 log-6.meta
>>> -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28 log-7
>>> -rw-r--r-- 1 zhiwensun zhiwensun   29 2013-03-19 15:28 log-7.meta
>>> -rw-r--r-- 1 zhiwensun zhiwensun 207M 2013-03-19 15:28
>>> ./file-channel/data/log-7
>>> -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 10:12
>>> ./file-channel/data/log-6.meta
>>> -rw-r--r-- 1 zhiwensun zhiwensun 29 2013-03-19 15:28
>>> ./file-channel/data/log-7.meta
>>> -rw-r--r-- 1 zhiwensun zhiwensun 0 2013-03-19 09:15
>>> ./file-channel/data/in_use.lock
>>> -rw-r--r-- 1 zhiwensun zhiwensun 1.0M 2013-03-19 10:11
>>> ./file-channel/data/log-6
>>>
>>>
>>>
>>>
>>>
>>>
>>> Zhiwen Sun
>>>
>>>
>>>
>>> On Wed, Mar 20, 2013 at 2:32 AM, Hari Shreedharan <
>>> hshreedharan@cloudera.com> wrote:
>>>
>>>  It is possible for the directory size to increase even if no writes are
>>> going in to the channel. If the channel size is non-zero and the sink is
>>> still writing events to HDFS, the takes get written to disk as well (so we
>>> know what events in the files were removed when the channel/agent
>>> restarts). Eventually the channel will clean up the files which have all
>>> events taken (though it will keep at least 2 files per data directory, just
>>> to be safe).
>>>
>>> --
>>> Hari Shreedharan
>>>
>>> On Tuesday, March 19, 2013 at 10:32 AM, Alexander Alten-Lorenz wrote:
>>>
>>> Hey,
>>>
>>> what says debug? Do you can gather logs and attach them?
>>>
>>> - Alex
>>>
>>> On Mar 19, 2013, at 5:27 PM, "Kenison, Matt" <Matt.Kenison@disney.com>
>>> wrote:
>>>
>>> Check the JMX counter first, to make sure you really are not sending new
>>> events. If not, is it your checkpoint directory or data directory that is
>>> increasing in size?
>>>
>>>
>>> From: Zhiwen Sun <pensz01@gmail.com>
>>> Reply-To: "user@flume.apache.org" <user@flume.apache.org>
>>> Date: Tue, 19 Mar 2013 01:19:19 -0700
>>> To: "user@flume.apache.org" <user@flume.apache.org>
>>> Subject: Why used space of flie channel buffer directory increase?
>>>
>>> hi all:
>>>
>>> I test flume-ng in my local machine. The data flow is :
>>>
>>> tail -F file | nc 127.0.0.01 4444 > flume agent > hdfs
>>>
>>> My configuration file is here :
>>>
>>> a1.sources = r1
>>> a1.channels = c2
>>>
>>> a1.sources.r1.type = netcat
>>> a1.sources.r1.bind = 192.168.201.197
>>> a1.sources.r1.port = 44444
>>> a1.sources.r1.max-line-length = 1000000
>>>
>>> a1.sinks.k1.type = logger
>>>
>>> a1.channels.c1.type = memory
>>> a1.channels.c1.capacity = 10000
>>> a1.channels.c1.transactionCapacity = 10000
>>>
>>> a1.channels.c2.type = file
>>> a1.sources.r1.channels = c2
>>>
>>> a1.sources.r1.interceptors = i1
>>> a1.sources.r1.interceptors.i1.type = timestamp
>>>
>>> a1.sinks = k2
>>> a1.sinks.k2.type = hdfs
>>> a1.sinks.k2.channel = c2
>>> a1.sinks.k2.hdfs.path = hdfs://127.0.0.1:9000/flume/events/%Y-%m-%d
>>> a1.sinks.k2.hdfs.writeFormat = Text
>>> a1.sinks.k2.hdfs.rollInterval = 10
>>> a1.sinks.k2.hdfs.rollSize = 10000000
>>> a1.sinks.k2.hdfs.rollCount = 0
>>>
>>> a1.sinks.k2.hdfs.filePrefix = app
>>> a1.sinks.k2.hdfs.fileType = DataStream
>>>
>>>
>>>
>>>
>>> it seems that events were collected correctly.
>>>
>>> But there is a problem boring me: Used space of file channel (~/.flume)
>>> has always increased, even there is no new event.
>>>
>>> Is my configuration wrong or other problem?
>>>
>>> thanks.
>>>
>>>
>>> Best regards.
>>>
>>> Zhiwen Sun
>>>
>>>
>>> --
>>> Alexander Alten-Lorenz
>>> http://mapredit.blogspot.com
>>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>>>
>>>
>>>
>>>
>>>
>>
>

Mime
View raw message