flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Chavez <pcha...@ntent.com>
Subject RE: HDFS sink: "clever" routing
Date Wed, 15 Oct 2014 15:57:04 GMT
Yes, that will work fine. From experience, I can say definitely account for the possibility
of the 'tenant' and 'data_type' headers being corrupted or missing outright.

At my org we have a similar setup where we auto-bucket on a 'logSubType' header that our application
adds to the initial flume event. To keep channels from blocking if this header goes missing
we have a static interceptor that adds the value 'MissingSubType' if the header does not exist.
This setup has worked well for us across dozens of separate log streams for over a year.

Hope that helps,
Paul Chavez


-----Original Message-----
From: Jean-Philippe Caruana [mailto:jp@target2sell.com] 
Sent: Wednesday, October 15, 2014 7:03 AM
To: user@flume.apache.org
Subject: HDFS sink: "clever" routing

Hi,

I am new to Flume (and to HDFS), so I hope my question is not stupid.

I have a multi-tenant application (about 100 different customers as for now).
I have 16 different data types.

(In production, we have approx. 15 million messages/day through our
RabbitMQ)

I want to write to HDFS all my events, separated by tenant, data type, and date, like this
:
/data/{tenant}/{data_type}/2014/10/15/file-08.csv

Is it possible with one sink definition ? I don't want to duplicate configuration, and new
client arrive every week or so

In documentation, I see
agent1.sinks.hdfs-sink1.hdfs.path = hdfs://server/events/%Y/%m/%d/%H/

Is this possible ?
agent1.sinks.hdfs-sink1.hdfs.path =
hdfs://server/events/%tenant/%type/%Y/%m/%d/%H/

I want to write to different folder according to my incoming data.

Thanks

--
Jean-Philippe Caruana
http://www.barreverte.fr

Mime
View raw message