From Chris Neal <cwn...@gmail.com>
Subject Simple HDFS Sink file rolling question please.
Date Thu, 21 Mar 2013 20:21:02 GMT
Hi :)

I have an ExecSource running a tail -F on a bunch of log files that get
rotated nightly by log4J.  I want my HDFS Sink to roll them when log4J
rolls them.  I tried setting all the "roll" parameters to 0, thinking a new
file handle from the ExecSource would cause the current file in HDFS to be
closed, and a new file to be created, but I'm seeing only the new file
created, and the previous days file is still there as a .tmp file, unclosed.

I was wondering what configuration would achieve the behavior I'm after?
I was thinking a rollInterval of 24 hours, but wouldn't that cause HDFS to
roll the file at a different time than log4J rolled it?

Thanks for the time :)

Here is my HDFS Sink setup currently:

# hdfs-hadoopjt01_1-sink properties
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.type = hdfs
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.path =
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.filePrefix =
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.rollInterval = 0
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.rollSize = 0
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.rollCount = 0
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.batchSize = 10000
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.threadsPoolSize = 8
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.rollTimerPoolSize = 5
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.codeC = GzipCodec
hadoopjt01.sinks.hdfs-hadoopjt01_1-sink.hdfs.fileType = CompressedStream

