flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Chavez <pcha...@verticalsearchworks.com>
Subject RE: Writing to HDFS from multiple HDFS agents (separate machines)
Date Thu, 14 Mar 2013 22:31:03 GMT
You can use a Host Interceptor on the agents running an HDFS sink, and then use %{host} in
the .hdfs.filePrefix property. This isn't really documented but it works, docs only mention
using those tokens in the path property but they seem to be ok for the filePrefix.

Here's some excerpts of a test config I have that does just that:

#define the interceptor on the source
staging2.sources.httpSource_stg.interceptors = iHost
staging2.sources.httpSource_stg.interceptors.iHost.type = host
staging2.sources.httpSource_stg.interceptors.iHost.useIP = false

#use the header the interceptor added in the filePrefix
staging2.sinks.hdfs_FilterLogst.type = hdfs
staging2.sinks.hdfs_FilterLogs.channel = mc_FilterLogs
staging2.sinks.hdfs_FilterLogs.hdfs.path = /flume_stg/FilterLogsJSON/%Y%m%d
staging2.sinks.hdfs_FilterLogs.hdfs.filePrefix = %{host}

Hope that helps,
Paul Chavez

From: Gary Malouf [mailto:malouf.gary@gmail.com]
Sent: Thursday, March 14, 2013 2:55 PM
To: user
Subject: Writing to HDFS from multiple HDFS agents (separate machines)

Hi guys,

I'm new to flume (hdfs for that metter), using the version packaged with CDH4 (1.3.0) and
was wondering how others are maintaining different file names being written to per HDFS sink.

My initial thought is to create a separate sub-directory in hdfs for each sink - though I
feel like the better way is to somehow prefix each file with a unique sink id.  Are there
any patterns that others are following for this?


View raw message