flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anandkumar Lakshmanan <an...@orzota.com>
Subject Re: why lots of tmp files in hdfs
Date Thu, 04 Sep 2014 04:58:43 GMT
Hi,

You can decide the file size to be stored in HDFS by using the following
properties:

* hdfs.rollInterval ---> Number of seconds to wait before rolling
current file(0 = never roll based on time interval) and Default value is
30 seconds.

* hdfs.rollSize ---> File size to trigger roll, in bytes (0: never roll
based on file size) and Default value is 1024bytes.

* hdfs.rollCount ---> Number of events written to file before it rolled
(0 = never roll based on number of events) and Default value is 10.

We have to specify based on "file size" or "number of events in a file"
or "number of seconds to wait to roll the file".

In your configuration you specified as "*rollInterval = 300*", i.e 300
seconds(5minutes) to wait before rolling the current file.


* idleTimeout ---> Timeout after which inactive files get closed (0 =
disable automatic closing of idle files).

Also, you specified "*idleTimeout = **1800000*"*(3000 minutes, the file
will roll only after 3000 minutes of inactive state)*. This is the
reason why you are getting all the files with*.tmp state*.
Reduce this value to 30 or 60 seconds then it will work well.

Thanks
Anand.




On 09/04/2014 09:09 AM, Wan Yi(武汉_技术部_搜索与精准化_万毅) wrote:
>
> Hi, all
>
> I am using hdfs sink to store logs, I saw lots of tmp files(more than
> 10 ) in hdfs , Can anybody know why ?
>
> Below is my hdfs configurations
>
> Our hadoop version is : Hadoop 2.3.0-cdh5.0.2
>
> Flume version is : 1.4.0
>
> a1.sinks.sinks1.type = hdfs
>
> a1.sinks.sinks1.channel = ch1
>
> a1.sinks.sinks1.hdfs.path = hdfs://xxxxxxx
>
> a1.sinks.sinks1.hdfs.filePrefix = events
>
> a1.sinks.sinks1.hdfs.batchSize = 1000
>
> a1.sinks.sinks1.hdfs.rollCount = 0
>
> a1.sinks.sinks1.hdfs.rollSize = 0
>
> a1.sinks.sinks1.hdfs.rollInterval = 300
>
> a1.sinks.sinks1.hdfs.idleTimeout = 1800000
>
> a1.sinks.sinks1.hdfs.callTimeout = 180000
>
> a1.sinks.sinks1.hdfs.threadsPoolSize = 250
>
> a1.sinks.sinks1.hdfs.writeFormat = Text
>
> a1.sinks.sinks1.hdfs.fileType = DataStream
>
> *Best Regards*
>
> *Wayne Wan*
>
> 	
>
> *Best Regards*
>
> *万毅**(Wayne Wan)
> **Dev*@*个 性精准化&无线部
> **说明: ad-dolphin***
>
> 	
>
> ------------------------------------------------------------------------
>
> +*Email:*wanyi@yhd.com <mailto:wanyi@yhd.com>
>
> (*Cell:*+86.1387.1388.731
>
> **Addr:*8/F, Building F6, Optics Valley Software Park, Guanshan
> Avenue, Wuhan, China. 430074
>
> ------------------------------------------------------------------------
>


Mime
View raw message