flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wang, Yongkun | Yongkun | BDD" <yongkun.w...@mail.rakuten.com>
Subject Re: Problems with HDFS Sink (file rolling)
Date Thu, 02 Aug 2012 00:58:48 GMT
I remember I had similar experience with 1.1.0.
I suggest to download the 1.2.0 and try it again.

Regards,
Yongkun Wang


On 12/08/01 21:41, "Christian Schroer" <cschroer@autoscout24.com> wrote:

>Hi,
>
>i have some trouble setting up the HDFS sink in Flume-NG (CDH3U4, 1.1.0):
>
>Here's my sink configuration:
>
>agent.sinks.hdfsSinkSMP.type = hdfs
>agent.sinks.hdfsSinkSMP.channel = memoryChannel
>agent.sinks.hdfsSinkSMP.hdfs.filePrefix = flumenode1
>agent.sinks.hdfsSinkSMP.hdfs.fileType = SequenceFile
>agent.sinks.hdfsSinkSMP.hdfs.codeC = gzip
>agent.sinks.hdfsSinkSMP.hdfs.rollCount = 0
>agent.sinks.hdfsSinkSMP.hdfs.batchSize = 1
>agent.sinks.hdfsSinkSMP.hdfs.rollInterval = 15
>agent.sinks.hdfsSinkSMP.hdfs.rollSize = 0
>agent.sinks.hdfsSinkSMP.hdfs.path =
>hdfs://namenode/user/hive/warehouse/someDatabase.db
>/someTable/%Y-%m-%d/%H00/%M/somePartion
>
>Events are genereated by a SyslogTcp source. We write the data into hive
>partions. This works, it just keeps open a lot of .tmp files. I disabled
>event count and size based file rolling, just enabled the interval to
>have the files closed after 15 seconds. But flume keeps files open much
>longer than 15 seconds (sometimes for hours or even never closing them).
>Also stopping flume keeps .tmp files in those directories. Sometimes it
>opens new files in partions without having any data for those. Maybe I'm
>doing the file rolling completely wrong?
>
>Some hive jobs use 5 minutes old data, but if flume renames a file after
>job start, the job fails. That's the reason why I want to close the files
>after 15 seconds. New files are no problems.
>
>Anyone has an idea?
>
>Best regards,
>Christian
>



Mime
View raw message